会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 32. 发明授权
    • Method and system for generating a decision-tree classifier independent
of system memory size
    • 用于生成独立于系统内存大小的决策树分类器的方法和系统
    • US5799311A
    • 1998-08-25
    • US646893
    • 1996-05-08
    • Rakesh AgrawalManish MehtaJohn Christopher Shafer
    • Rakesh AgrawalManish MehtaJohn Christopher Shafer
    • G06F17/30
    • G06F17/30705G06F17/30625G06F2216/03Y10S707/99943
    • A method and system are disclosed for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method comprises the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The created tree is further pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.
    • 公开了用于从记录的训练集合生成决策树分类器的方法和系统,与系统存储器大小无关。 该方法包括以下步骤:为记录的每个属性生成属性列表,对数字属性的属性列表进行排序,以及通过使用属性列表重复分割记录来生成决策树。 对于每个节点,分析点进行评估,以确定分区节点上的记录的最佳分割测试。 优选地,使用基尼系数索引和类别直方图来确定最佳分割。 gini指数表示分割点将记录分离成多少,而类直方图反映了节点上记录的类分布。 此外,由于分割属性的属性列表在子节点之间划分,因此构建了哈希表,然后用于分割节点的剩余属性列表。 基于MDL原理进一步修剪创建的树,MDL原理对基于MDL的代码中的树和分割测试进行编码,并根据节点的代码长度确定是否修剪和如何修剪每个节点。