专利号CN201811341167.9 | 一种基于机器学习的非平衡数据集的处理方法和装置

专利标题：一种基于机器学习的非平衡数据集的处理方法和装置
专利标题（英）：Unbalanced data set processing method and device based on machine learning
申请号：CN201811341167.9 申请日：2018-11-12
公开(公告)号：CN109635839A 公开(公告)日：2019-04-16
发明人：王栋 , 韩庆芝 , 王波 , 玄佳兴 , 王俊生 , 李丽丽 , 韩文慧 , 吕梓童 , 张宏廷
申请人：国家电网有限公司 , 国网电子商务有限公司 , 国网雄安金融科技有限公司
申请人地址：北京市西城区西长安街86号; ;
专利权人：国家电网有限公司,国网电子商务有限公司,国网雄安金融科技有限公司
当前专利权人：国家电网有限公司,国网数字科技控股有限公司国网雄安金融科技集团有限公司
当前专利权人地址：北京市西城区西长安街86号; ;
代理机构：北京中博世达专利商标代理有限公司
代理人：申健
主分类号： G06K9/62
IPC分类号： G06K9/62

摘要：

本发明实施例公开了一种基于机器学习的非平衡数据集的处理方法和装置，涉及数据处理的技术领域，能够解决SMOTE算法合成“人造”样本过程中造成的分布边缘化问题。该处理方法包括：一种基于机器学习的非平衡数据集的处理方法，包括：根据包含多个多数类图像样本的第一样本集合和包含多个少数类图像样本的初始的第二样本集合，生成中心样本，其中，所述多数类图像样本和所述少数类图像样本均包含有N维属性，所述中心样本是由多个所述多数类图像样本和多个所述少数类图像样本每个维度的属性的平均值组成，N≥1；在所述中心样本与至少一个所述少数类图像样本之间进行随机线性插值，生成新增少数类样本，得到利用所述新增少数类样本更新后的第二样本集合。

摘要（英）：

Embodiments of the invention disclose an unbalanced data set processing method and device based on machine learning, relates to the technical field of data processing, and can solve the problem of distribution marginalization caused in the process of synthesizing an'artificial 'sample through an SMOTE algorithm. The unbalanced data set processing method based on machine learning comprises: obtaining a sample set according to a first sample set containing a plurality of types of image samples and an initial second sample set containing a plurality of types of image samples, generating a centralsample, the plurality of types of image samples and the few types of image samples including N-dimensional attributes, the central sample being composed of an average value of attributes of the plurality of types of image samples and the few types of image samples in each dimension, and N being greater than or equal to 1; and performing random linear interpolation between the central sample and the at least one minority class image sample to generate a newly-added minority class sample, and obtaining a second sample set updated by utilizing the newly-added minority class sample.

CN109635839B 一种基于机器学习的非平衡数据集的处理方法和装置公开/授权日：2020-07-14

中国专利公布公告审查信息 Global Dossier Espacenet

G	物理
--G06	计算；推算；计数
----G06K	数据识别；数据表示；记录载体；记录载体的处理
------G06K9/00	用于阅读或识别印刷或书写字符或者用于识别图形，例如，指纹的方法或装置
--------G06K9/62	.应用电子设备进行识别的方法或装置

发明公开 CN109635839A 一种基于机器学习的非平衡数据集的处理方法和装置 有权

基本信息:

公开/授权文献:

信息查询:

IPC结构图谱:

IPRDB

热门服务

关于我们

友情链接

联系方式