会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Duplicate entry detection system and method
    • 重复条目检测系统和方法
    • US08046372B1
    • 2011-10-25
    • US11754237
    • 2007-05-25
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • G06F7/00G06F17/30
    • G06F17/30616
    • A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
    • 一种计算机系统和方法,用于确定在接收到的文档中描述的主题与文档语料库中的其他文档的主题是否基本相似,使得所接收的文档可以被认为是重复的文档。 在收到第一个文档之后,生成第一个文档的一组令牌。 执行令牌索引上的非字段相关搜索。 相关性搜索返回一组具有与每个候选文档相对应的分数的候选重复文档。 对于分数高于阈值的每个候选文档,对每个候选文档进行过滤以确定每个候选文档是否是第一个文档的真实副本。 然后提供一组具有不超过门槛的分数的候选文件,不被取消作为候选文件的资格。
    • 2. 发明授权
    • Identifying potential duplicates of a document in a document corpus
    • 在文档语料库中识别文档的潜在重复项
    • US07895225B1
    • 2011-02-22
    • US11952020
    • 2007-12-06
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • G06F7/00G06F17/00
    • G06F17/30528G06F17/30483G06F17/3071
    • According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document is provided. A source document is obtained. A list of queries corresponding to a source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
    • 根据所公开的主题的方面,提供了一种用于从源文档的潜在重复的文档语料库中识别一组文档的方法。 得到一个源文件。 识别与源文档相对应的查询的列表。 在所识别的查询列表中的每个查询在文档语料库上执行,其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。 对于每个结果集中识别的每个文档,根据识别的文档在其结果集中的序数位置,为所识别的文档生成文档分数。 根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。 识别的文档的所选子集被存储或显示。
    • 4. 发明授权
    • Method and system for generating a normalized configuration model
    • 用于生成归一化配置模型的方法和系统
    • US07567922B1
    • 2009-07-28
    • US10924630
    • 2004-08-24
    • Michael E. WeinbergDavid F. MeekerGrant M. Emery
    • Michael E. WeinbergDavid F. MeekerGrant M. Emery
    • G06Q30/00
    • G06Q30/00G06Q30/0621
    • Normalized data models are programmatically generated from a combination of product configuration model data, product configuration engine runtime validation, normalized data mappings, and settings files declaring the scope of model content. A master model generation process effectively transforms conventional configuration data into normalized configuration data. The normalized configuration data allows a user to, for example, conduct comparative product configurations. In one embodiment, a normalized model generation process generates normalized data model representing attributes and normalized features of a product. In one embodiment, the normalized configuration data model is then added to in-memory data structures used during runtime contextual configuration analysis, thus reducing the total number of data items preserved as efficiencies result from eliminating duplication and effective use of search structures. In-memory representation of the normalized configuration data model can then be serialized to disk as a file to be loaded for runtime use in a deployment.
    • 归一化数据模型通过产品配置模型数据,产品配置引擎运行时验证,规范化数据映射和声明模型内容范围的设置文件的组合以编程方式生成。 主模型生成过程有效地将常规配置数据转换为归一化配置数据。 归一化的配置数据允许用户例如进行比较产品配置。 在一个实施例中,归一化模型生成过程生成表示产品的属性和归一化特征的归一化数据模型。 在一个实施例中,然后将归一化配置数据模型添加到在运行时情境配置分析期间使用的存储器内数据结构,从而减少由于消除重复和有效使用搜索结构而导致的效率的保留的数据项的总数。 然后,归一化配置数据模型的内存中表示可以序列化为磁盘,作为要在部署中运行时使用的要加载的文件。
    • 5. 发明授权
    • Identifying potential duplicates of a document in a document corpus
    • 在文档语料库中识别文档的潜在重复项
    • US09195714B1
    • 2015-11-24
    • US13030114
    • 2011-02-17
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • G06F17/30
    • G06F17/30528G06F17/30483G06F17/3071
    • According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document, is provided. A source document is obtained. A list of queries corresponding to the source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
    • 根据所公开的主题的方面,提供了一种用于从文档语料库中识别源文档的潜在重复的一组文档的方法。 得到一个源文件。 识别与源文档相对应的查询的列表。 在所识别的查询列表中的每个查询在文档语料库上执行,其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。 对于每个结果集中识别的每个文档,根据识别的文档在其结果集中的序数位置,为所识别的文档生成文档分数。 根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。 识别的文档的所选子集被存储或显示。
    • 10. 发明授权
    • Comparison engine for identifying documents describing similar subject matter
    • 用于识别描述相似主题的文档的比较引擎
    • US07904462B1
    • 2011-03-08
    • US11953726
    • 2007-12-10
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • G06F7/00G06F17/00
    • G06Q30/06
    • Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
    • 用于确定第一文档是否是第二文档的潜在副本的系统和方法,使得两个文档描述相同或基本相同的主题,其中第一和第二文档包括属性字段中的属性数据。 获得一组用于确定第一文档是否是第二文档的潜在副本的规则。 此外,对于该组规则中的每个规则,确定第一文档的第一组属性中的数据是否包含在第二文档的第二组属性中。 根据规则集中评估规则的结果,确定第一个文档是否是第二个文档的潜在副本。 如果根据规则集中的评估规则,确定第一文档是第二文档的潜在副本,则将第一文档的引用存储在第二文档的一组潜在重复项中。