会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • System and method for determining numerical representations for categorical data fields
    • 用于确定分类数据字段的数值表示的系统和方法
    • US07272590B2
    • 2007-09-18
    • US10383182
    • 2003-03-06
    • Andreas ArningChristoph LingenfelderGregor MeyerDieter RollerSwen Wohland
    • Andreas ArningChristoph LingenfelderGregor MeyerDieter RollerSwen Wohland
    • G06F17/30
    • G06F17/30595H03M7/14Y10S707/99932Y10S707/99942
    • A system and method determine numerical representations for categorical data fields by taking advantage of the redundancy of the data records to allow automatic discovery of an order of the categories. A categorical data field is recoded by creating separate tables for each numerical data field occurring in the data records. The separate tables are sorted according to the numerical values of the respective data fields. The recoding of the categories is performed based on the average sort order of occurrences of the category in a specific sorted table. The standard deviation of the numerical codes provided by the categories is calculated for each of the separate recoding tables. The recoding table with the maximum standard deviation is selected as the recoding table to perform the recoding of the categories contained in the respective categorical data field of the data records. A plausibility check is performed for the selected recoding table by excluding the numerical data field that has formed the basis for the sorting of the respective table and recreating the recoding table from the data records. The resulting recoding table and the original recoding table are compared. Resulting recoding tables that are similar indicate a high level of confidence that the originally selected recoding table is optimal.
    • 系统和方法通过利用数据记录的冗余来确定分类数据字段的数值表示,以允许自动发现类别的顺序。 通过为数据记录中出现的每个数字数据字段创建单独的表来重新编码分类数据字段。 根据各个数据字段的数值对不同的表进行排序。 基于特定排序表中的类别的出现的平均排序顺序来执行类别的重新编码。 为每个单独的记录表计算由类别提供的数字代码的标准偏差。 选择具有最大标准偏差的记录表作为记录表,以对包含在数据记录的相应分类数据字段中的类别进行重新编码。 通过排除已经形成用于对各个表的排序的基础的数字数据字段并从数据记录重新创建记录表,对选择的记录表执行合理性检查。 比较所得到的记录表和原始记录表。 类似的结果记录表表示最初选择的重新编码表是最佳的高置信度。