会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 23. 发明授权
    • Method and system for performing proximity joins on high-dimensional
data points in parallel
    • 用于在高维数据点上并行执行邻近连接的方法和系统
    • US5884320A
    • 1999-03-16
    • US920331
    • 1997-08-20
    • Rakesh AgrawalJohn Christopher Shafer
    • Rakesh AgrawalJohn Christopher Shafer
    • G06F17/30
    • G06F17/30592G06F17/30445G06F17/30498Y10S707/99932Y10S707/99945Y10S707/99948
    • A method and system for performing spatial proximity joins on high-dimensional points representing data objects of a database in parallel in a multiprocessor system. The method comprises the steps of: partitioning the data points among the processors; creating index structures for the data points of the processors in parallel; assigning the join operations to the processors using the index structures; and simultaneously redistributing and joining the data points in the processors in parallel based on a predetermined joining condition. An efficient data structure, .epsilon.-K-D-B tree, is used to provide fast access to the high-dimensional points and to minimize system storage requirements. The invention achieves fast response time and requires minimum storage space by having structurally identical indices among the processors, assigning workload based on the join costs, and redistributing the data points among the processors while joining the data whenever possible.
    • 一种用于在多处理器系统中并行地表示数据库的数据对象的高维点处执行空间邻近连接的方法和系统。 该方法包括以下步骤:对处理器之间的数据点进行分割; 为处理器的数据点并行创建索引结构; 使用索引结构将连接操作分配给处理器; 并且基于预定的接合条件并行地重新分配和连接处理器中的数据点。 使用有效的数据结构epsilon -K-D-B树来提供对高维点的快速访问并且最小化系统存储要求。 本发明通过在处理器之间具有结构相同的索引来实现快速的响应时间并且需要最小的存储空间,基于加入成本分配工作负荷,并且在可能的情况下加入数据时在处理器之间重新分配数据点。
    • 26. 发明授权
    • Method and system for generating a decision-tree classifier independent
of system memory size
    • 用于生成独立于系统内存大小的决策树分类器的方法和系统
    • US5799311A
    • 1998-08-25
    • US646893
    • 1996-05-08
    • Rakesh AgrawalManish MehtaJohn Christopher Shafer
    • Rakesh AgrawalManish MehtaJohn Christopher Shafer
    • G06F17/30
    • G06F17/30705G06F17/30625G06F2216/03Y10S707/99943
    • A method and system are disclosed for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method comprises the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The created tree is further pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.
    • 公开了用于从记录的训练集合生成决策树分类器的方法和系统,与系统存储器大小无关。 该方法包括以下步骤:为记录的每个属性生成属性列表,对数字属性的属性列表进行排序,以及通过使用属性列表重复分割记录来生成决策树。 对于每个节点,分析点进行评估,以确定分区节点上的记录的最佳分割测试。 优选地,使用基尼系数索引和类别直方图来确定最佳分割。 gini指数表示分割点将记录分离成多少,而类直方图反映了节点上记录的类分布。 此外,由于分割属性的属性列表在子节点之间划分,因此构建了哈希表,然后用于分割节点的剩余属性列表。 基于MDL原理进一步修剪创建的树,MDL原理对基于MDL的代码中的树和分割测试进行编码,并根据节点的代码长度确定是否修剪和如何修剪每个节点。