会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明申请
    • Index Structure for Supporting Structural XML Queries
    • 支持结构XML查询的索引结构
    • US20070271243A1
    • 2007-11-22
    • US11780095
    • 2007-07-19
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F17/30
    • G06F17/30911Y10S707/99933Y10S707/99943
    • The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
    • 本发明提供了一种ViST(或“虚拟后缀树”),其是用于搜索XML文档的新型索引结构。 通过在结构编码序列中同时表示XML文档和XML查询,显示查询XML数据等同于查找(非连续)子序列匹配。 各种XML查询(包括具有分支的查询)或通配符('*'和'//')可以由结构编码的序列表示。 不同于将查询反汇编成多个子查询的索引方法,然后加入这些子查询的结果以提供最终答案,ViST使用树结构作为查询的基本单位,以避免昂贵的连接操作。 此外,ViST为XML文档的内容和结构提供了一个统一的索引,因此与仅通过内容或结构索引方法相比,它具有性能优势。 ViST支持动态索引更新,它仅仅依赖于B< +>树,而不使用通用数据库管理系统(以下简称“DBMS”)不能很好支持的任何专门的数据结构。
    • 4. 发明申请
    • System and method for sequencing XML documents for tree structure indexing
    • 用于对树结构索引的XML文档进行排序的系统和方法
    • US20060161575A1
    • 2006-07-20
    • US11035889
    • 2005-01-14
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F7/00
    • G06F17/30935Y10S707/99933Y10S707/99936
    • Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addresed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution; there is thus presented herein a novel method that realizes this principle.
    • 基于序列的XML索引旨在避免查询处理中的昂贵的联接操作。 它将结构化XML数据转换为序列,以便可以通过子序列匹配整体回答结构化查询。 这里提出了相对于这种转换的查询等价性的问题,并且引入了用于排序树结构的性能导向原理。 通过查询等价,可以通过子序列匹配执行XML查询,无需连接操作,后处理或其他特殊处理,例如虚假警报等问题。 确定了一类用于此目的的测序方法,并提出了一种观察查询等价性的新颖的子序列匹配算法。 还引入了一种以性能为导向的原则来指导树结构的排序。 对于任何给定的XML数据集,该原理根据其模式及其数据分布找到最佳排序策略; 因此在此呈现了实现这一原理的新颖方法。
    • 7. 发明申请
    • Systems and methods for subspace clustering
    • 用于子空间聚类的系统和方法
    • US20050278324A1
    • 2005-12-15
    • US10858541
    • 2004-05-31
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F7/00G06K9/62
    • G06K9/6215Y10S707/99936
    • Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including e-Commerce target marketing, bioinformatics (large scale scientific data analysis), and automatic computing (web usage analysis), etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. There is presented herein a novel method that offers this capability.
    • 与传统的集群方法不同,传统的集群方法集中在对一组维度上具有类似值的对象进行分组,通过模式相似性进行聚类可以找到在子空间中呈现一致的上升和下降模式的对象。 基于模式的群集扩展了传统群集的概念,受益于广泛的应用,包括电子商务目标营销,生物信息学(大规模科学数据分析)和自动计算(Web使用分析)等。然而,状态 基于图案的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用。 此外,除了巨大的数据量之外,许多数据集的特征还在于它们的顺序性,例如,客户购买记录和网络事件日志通常被建模为数据序列。 因此,重要的是启用基于图案的聚类方法i)处理大数据集,以及ii)发现嵌入在数据序列中的模式相似性。 这里提供了一种提供这种能力的新颖方法。
    • 8. 发明申请
    • System and method for mining time-changing data streams
    • 挖掘时变数据流的系统和方法
    • US20050278322A1
    • 2005-12-15
    • US10857030
    • 2004-05-28
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F7/00
    • G06N99/005Y10S707/99943Y10S707/99945
    • A general framework for mining concept-drifting data streams using weighted ensemble classifiers. An ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., is trained from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. An empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
    • 采用加权综合分类器挖掘概念漂移数据流的一般框架。 分类模型的集合,例如C4.5,RIPPER,朴素贝叶斯等,是从数据流的连续块中训练出来的。 根据其在时间不断变化的环境下的测试数据的预期分类精度,合理地加权集合中的分类器。 因此,综合方法提高了学习模型的效率和执行分类的准确性。 实证研究表明,所提出的方法在预测精度方面具有优于单分类器方法的优势,并且整体框架对于各种分类模型是有效的。
    • 9. 发明申请
    • System and method for adaptive pruning
    • 自适应修剪的系统和方法
    • US20050131873A1
    • 2005-06-16
    • US10737123
    • 2003-12-16
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F17/30
    • G06F17/30539G06F17/30598
    • Disclosed in a method and structure for searching data in databases using an ensemble of models. First the invention performs training. This training orders models within the ensemble in order of prediction accuracy and joins different numbers of models together to form sub-ensembles. The models are joined together in the sub-ensemble in the order of prediction accuracy. Next in the training process, the invention calculates confidence values of each of the sub-ensembles. The confidence is a measure of how closely results form the sub-ensemble will match results from the ensemble. The size of each of the sub-ensembles is variable depending upon the level of confidence, while, to the contrary, the size of the ensemble is fixed. After the training, the invention can make a prediction. First, the invention selects a sub-ensemble that meets a given level of confidence. As the level of confidence is raised, a sub-ensemble that has more models will be selected and as the level of confidence is lowered, a sub-ensemble that has fewer models will be selected. Finally, the invention applies the selected sub-ensemble, in place of the ensemble, to an example to make a prediction.
    • 公开了一种使用模型集合在数据库中搜索数据的方法和结构。 首先,发明执行训练。 这种训练按照预测精度的顺序对集合内的模型进行排序,并将不同数量的模型结合在一起形成子集合。 这些模型以预测精度的顺序连接在子集合中。 接下来在训练过程中,本发明计算每个子集合的置信度值。 信心是衡量子系统的结果与合奏结果相符的结果。 每个子集合的大小根据置信水平而变化,而相反,整体的大小是固定的。 训练后,本发明可以进行预测。 首先,本发明选择满足给定的置信水平的子集合。 随着信心的提高,将选择具有更多模型的子集合,并且随着置信度的降低,将选择具有较少模型的子集合。 最后,本发明将选择的子集合代替集合应用于一个例子进行预测。
    • 10. 发明申请
    • Index structure for supporting structural XML queries
    • 用于支持结构XML查询的索引结构
    • US20050114314A1
    • 2005-05-26
    • US10723206
    • 2003-11-26
    • Wei FanHaixun WangPhilip Yu
    • Wei FanHaixun WangPhilip Yu
    • G06F17/30
    • G06F17/30911Y10S707/99933Y10S707/99943
    • The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘//’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
    • 本发明提供了一种ViST(或“虚拟后缀树”),其是用于搜索XML文档的新型索引结构。 通过在结构编码序列中同时表示XML文档和XML查询,显示查询XML数据等同于查找(非连续)子序列匹配。 各种XML查询(包括具有分支的查询)或通配符('*'和'//')可以由结构编码的序列表示。 不同于将查询反汇编成多个子查询的索引方法,然后加入这些子查询的结果以提供最终答案,ViST使用树结构作为查询的基本单位,以避免昂贵的连接操作。 此外,ViST为XML文档的内容和结构提供了一个统一的索引,因此与仅通过内容或结构索引方法相比,它具有性能优势。 ViST支持动态索引更新,它仅仅依赖于B< +>树,而不使用通用数据库管理系统(以下简称“DBMS”)不能很好支持的任何专门的数据结构。