![一种基于机器学习的非平衡数据集的处理方法和装置](/CN/2018/1/268/images/201811341167.jpg)
基本信息:
- 专利标题: 一种基于机器学习的非平衡数据集的处理方法和装置
- 专利标题(英):Unbalanced data set processing method and device based on machine learning
- 申请号:CN201811341167.9 申请日:2018-11-12
- 公开(公告)号:CN109635839A 公开(公告)日:2019-04-16
- 发明人: 王栋 , 韩庆芝 , 王波 , 玄佳兴 , 王俊生 , 李丽丽 , 韩文慧 , 吕梓童 , 张宏廷
- 申请人: 国家电网有限公司 , 国网电子商务有限公司 , 国网雄安金融科技有限公司
- 申请人地址: 北京市西城区西长安街86号; ;
- 专利权人: 国家电网有限公司,国网电子商务有限公司,国网雄安金融科技有限公司
- 当前专利权人: 国家电网有限公司,国网数字科技控股有限公司国网雄安金融科技集团有限公司
- 当前专利权人地址: 北京市西城区西长安街86号; ;
- 代理机构: 北京中博世达专利商标代理有限公司
- 代理人: 申健
- 主分类号: G06K9/62
- IPC分类号: G06K9/62
Embodiments of the invention disclose an unbalanced data set processing method and device based on machine learning, relates to the technical field of data processing, and can solve the problem of distribution marginalization caused in the process of synthesizing an'artificial 'sample through an SMOTE algorithm. The unbalanced data set processing method based on machine learning comprises: obtaining a sample set according to a first sample set containing a plurality of types of image samples and an initial second sample set containing a plurality of types of image samples, generating a centralsample, the plurality of types of image samples and the few types of image samples including N-dimensional attributes, the central sample being composed of an average value of attributes of the plurality of types of image samples and the few types of image samples in each dimension, and N being greater than or equal to 1; and performing random linear interpolation between the central sample and the at least one minority class image sample to generate a newly-added minority class sample, and obtaining a second sample set updated by utilizing the newly-added minority class sample.
公开/授权文献:
- CN109635839B 一种基于机器学习的非平衡数据集的处理方法和装置 公开/授权日:2020-07-14
IPC结构图谱:
G | 物理 |
--G06 | 计算;推算;计数 |
----G06K | 数据识别;数据表示;记录载体;记录载体的处理 |
------G06K9/00 | 用于阅读或识别印刷或书写字符或者用于识别图形,例如,指纹的方法或装置 |
--------G06K9/62 | .应用电子设备进行识别的方法或装置 |