会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Generating data from imbalanced training data sets
    • 从不平衡训练数据集生成数据
    • US09224104B2
    • 2015-12-29
    • US14034797
    • 2013-09-24
    • International Business Machines Corporation
    • Ching-Yung LinWan-Yi LinYinglong Xia
    • G06F15/18G06N99/00
    • G06N99/005G06F17/50
    • Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.
    • 提供了将生成的数据样本注入到不平衡训练数据集的少数数据类中。 响应于接收到输入以平衡包括多数数据类别和少数数据类别的不平衡训练数据集,为少数数据类生成一组数据样本。 将生成的数据样本集中的每个数据样本的距离计算到包含大多数数据类别的一组数据样本的内核中心。 所生成的数据样本组中的每个数据样本基于所计算的数据样本的距离被存储在相应的距离分数桶内。 从多个最高排名的距离得分桶中选择生成的数据样本。 将从最高排名距离得分桶数中选出的生成数据样本注入到少数数据类中。
    • 6. 发明申请
    • GENERATING DATA FROM IMBALANCED TRAINING DATA SETS
    • 从不平等的培训数据集生成数据
    • US20150088791A1
    • 2015-03-26
    • US14034797
    • 2013-09-24
    • International Business Machines Corporation
    • Ching-Yung LinWan-Yi LinYinglong Xia
    • G06N99/00
    • G06N99/005G06F17/50
    • Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.
    • 提供了将生成的数据样本注入到不平衡训练数据集的少数数据类中。 响应于接收到输入以平衡包括多数数据类别和少数数据类别的不平衡训练数据集,为少数数据类生成一组数据样本。 将生成的数据样本集中的每个数据样本的距离计算到包含大多数数据类别的一组数据样本的内核中心。 所生成的数据样本组中的每个数据样本基于所计算的数据样本的距离被存储在相应的距离分数桶内。 从多个最高排名的距离得分桶中选择生成的数据样本。 将从最高排名距离得分桶数中选出的生成数据样本注入到少数数据类中。
    • 8. 发明申请
    • GENERATING DATA FROM IMBALANCED TRAINING DATA SETS
    • 从不平等的培训数据集生成数据
    • US20150356464A1
    • 2015-12-10
    • US14831434
    • 2015-08-20
    • International Business Machines Corporation
    • Ching-Yung LinWan-Yi LinYinglong Xia
    • G06N99/00G06F17/50
    • G06N99/005G06F17/50
    • Determining a number of kernels within a model is provided. A number of kernels that include data samples of a majority data class of an imbalanced training data set is determined based on a set of generated artificial data samples for a minority data class of the imbalanced training data set. The number of kernels within the model is generated based on the set of generated artificial data samples. A likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set is calculated. Parameters of each kernel in the number of kernels are updated based on the likelihood of the set of generated artificial data samples being included in the majority data class of the imbalanced training data set. Each kernel in the number of kernels is adjusted based on the updated parameters.
    • 提供了确定模型中的一些内核。 基于用于不平衡训练数据集的少数数据类别的生成的人造数据样本的集合来确定包括不平衡训练数据集的多数数据类别的数据样本的多个内核。 基于所生成的人造数据样本集,生成模型内核的数量。 计算生成的人造数据样本集合包含在不平衡训练数据集合的多数数据类别中的可能性。 基于在不平衡训练数据集的大多数数据类中包括生成的人造数据样本集合的可能性来更新核心数目中的每个内核的参数。 基于更新的参数调整内核数量中的每个内核。