会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 8. 发明授权
    • Routing XML queries
    • 路由XML查询
    • US07664806B1
    • 2010-02-16
    • US10830285
    • 2004-04-22
    • Nikolaos KoudasDivesh SrivastavaMichael Rabinovich
    • Nikolaos KoudasDivesh SrivastavaMichael Rabinovich
    • G06F7/00G06F15/16
    • G06F17/30929G06F17/30545
    • A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data. There is therefore provided a method for accessing data over a wide area network comprising: providing a decentralized architecture comprising a plurality of data nodes each having a database, a query processor and a path index, and a plurality of router nodes each having a routing state, maintaining a routing state in each of the router nodes, broadcasting routing state updates from each of the databases to the router nodes, routing path queries to each of the databases by accessing the routing state.
    • 目前可以通过Web和企业网络访问的大量信息存储在各种数据库中,并作为XML数据导出。 然而,以声明和及时的方式查询这些信息是有问题的,因为这组数据库是动态的,并且常见的模式很难维护。 本发明提供了解决针对XML数据库的这种动态集合发出声明性特征XPath查询并及时接收答案的问题的解决方案。 提出了一种分散架构,在一组网站之间的开放协议合作模式下,用于处理查询和更新XML数据。 每个站点由XML数据节点组成。 (它们以XML格式导出数据,并提供查询)和一个XML路由器节点(管理查询和更新站点之间的交互)。 各种路由器节点对包含特定XML数据的数据节点的知识程度不同。 因此,提供了一种用于通过广域网访问数据的方法,包括:提供分散式架构,其包括多个数据节点,每个数据节点具有数据库,查询处理器和路径索引,以及多个路由器节点,每个节点具有路由状态 在每个路由器节点中保持路由状态,从每个数据库向路由器节点广播路由状态更新,通过访问路由状态将路由查询路由到每个数据库。
    • 9. 发明申请
    • Text joins for data cleansing and integration in a relational database management system
    • 文本连接用于关系数据库管理系统中的数据清理和集成
    • US20050027717A1
    • 2005-02-03
    • US10828819
    • 2004-04-21
    • Nikolaos KoudasDivesh SrivastavaLuis GravanoPanagiotis Ipeirotis
    • Nikolaos KoudasDivesh SrivastavaLuis GravanoPanagiotis Ipeirotis
    • G06F7/02G06F17/30
    • G06F16/2462G06F16/215G06F16/284G06F16/3347
    • An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.
    • 组织的数据记录通常是嘈杂的:因为转录错误,信息不完整以及文本数据的标准格式不足。 在数据清理和集成过程中,一个基本任务是匹配字符串(可能是跨多个关系),它们指的是同一个实体(例如,组织名称或地址)。 此外,希望在数据可能驻留的RDBMS内执行该匹配。 在本文中,我们将广泛使用和建立的余弦相似性度量从信息检索领域适应到关系数据库上下文,以便识别跨关系的潜在字符串匹配。 然后,我们使用这种相似性度量来表征数据清理和集成的这个关键方面,作为文本属性之间的关系之间的连接,其中匹配的相似性超过了指定的阈值。 计算文本连接的确切答案可能是昂贵的。 对于查询处理效率,我们提出了一种基于抽样的近似方法,可以在标准的未修改的RDBMS中轻松有效地执行连接问题。 因此,本发明包括一种用于在关系数据库管理系统中跨多个关系进行字符串匹配的系统,包括从一组字符生成一组字符串,将每个字符串分解为令牌子集,建立字符串内的至少两个关系,建立 关系的相似性阈值,对至少两个关系进行采样,将相似性阈值的关系相关联并返回满足相似性阈值的标准的所有令牌。