专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08046372B1 Duplicate entry detection system and method 有权
标题翻译：重复条目检测系统和方法
公开(公告)号：US08046372B1
公开(公告)日：2011-10-25
申请号：US11754237
申请日：2007-05-25
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30616
摘要： A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
摘要翻译：一种计算机系统和方法，用于确定在接收到的文档中描述的主题与文档语料库中的其他文档的主题是否基本相似，使得所接收的文档可以被认为是重复的文档。在收到第一个文档之后，生成第一个文档的一组令牌。执行令牌索引上的非字段相关搜索。相关性搜索返回一组具有与每个候选文档相对应的分数的候选重复文档。对于分数高于阈值的每个候选文档，对每个候选文档进行过滤以确定每个候选文档是否是第一个文档的真实副本。然后提供一组具有不超过门槛的分数的候选文件，不被取消作为候选文件的资格。

2. 发明授权

US07895225B1 Identifying potential duplicates of a document in a document corpus 有权
标题翻译：在文档语料库中识别文档的潜在重复项
公开(公告)号：US07895225B1
公开(公告)日：2011-02-22
申请号：US11952020
申请日：2007-12-06
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
IPC分类号： G06F7/00 , G06F17/00
CPC分类号： G06F17/30528 , G06F17/30483 , G06F17/3071
摘要： According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document is provided. A source document is obtained. A list of queries corresponding to a source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
摘要翻译：根据所公开的主题的方面，提供了一种用于从源文档的潜在重复的文档语料库中识别一组文档的方法。得到一个源文件。识别与源文档相对应的查询的列表。在所识别的查询列表中的每个查询在文档语料库上执行，其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。对于每个结果集中识别的每个文档，根据识别的文档在其结果集中的序数位置，为所识别的文档生成文档分数。根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。识别的文档的所选子集被存储或显示。

3. 发明授权

US07814107B1 Generating similarity scores for matching non-identical data strings 有权
标题翻译：生成匹配不相同数据字符串的相似度分数
公开(公告)号：US07814107B1
公开(公告)日：2010-10-12
申请号：US11754241
申请日：2007-05-25
申请人： Srikanth Thirumalai , Egidio Terra , Vijai Mohan , Mark J. Tomko , Grant M. Emery , Aswath Manoharan
发明人： Srikanth Thirumalai , Egidio Terra , Vijai Mohan , Mark J. Tomko , Grant M. Emery , Aswath Manoharan
IPC分类号： G06F7/00 , G06F17/00
CPC分类号： G06F17/30011
摘要： A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.
摘要翻译：提出了一种用于确定描述基本相似主题的两个文档的可能性的系统和方法。获得两个文档中的每一个的一组令牌，每组代表在相应文档中找到的字符串。确定令牌对的矩阵，每个令牌对包括来自每组令牌的令牌。对于矩阵中的每个令牌对，确定相似性得分。选择具有相似性得分高于阈值分数的矩阵中的那些令牌对并将其添加到一组匹配的令牌中。根据添加到匹配令牌集中的令牌对的分数来确定两个文档的相似性得分。确定的相似度得分被提供为第一和第二文档描述基本相似的主题的可能性。

4. 发明授权

US07567922B1 Method and system for generating a normalized configuration model 有权
标题翻译：用于生成归一化配置模型的方法和系统
公开(公告)号：US07567922B1
公开(公告)日：2009-07-28
申请号：US10924630
申请日：2004-08-24
申请人： Michael E. Weinberg , David F. Meeker , Grant M. Emery
发明人： Michael E. Weinberg , David F. Meeker , Grant M. Emery
IPC分类号： G06Q30/00
CPC分类号： G06Q30/00 , G06Q30/0621
摘要： Normalized data models are programmatically generated from a combination of product configuration model data, product configuration engine runtime validation, normalized data mappings, and settings files declaring the scope of model content. A master model generation process effectively transforms conventional configuration data into normalized configuration data. The normalized configuration data allows a user to, for example, conduct comparative product configurations. In one embodiment, a normalized model generation process generates normalized data model representing attributes and normalized features of a product. In one embodiment, the normalized configuration data model is then added to in-memory data structures used during runtime contextual configuration analysis, thus reducing the total number of data items preserved as efficiencies result from eliminating duplication and effective use of search structures. In-memory representation of the normalized configuration data model can then be serialized to disk as a file to be loaded for runtime use in a deployment.
摘要翻译：归一化数据模型通过产品配置模型数据，产品配置引擎运行时验证，规范化数据映射和声明模型内容范围的设置文件的组合以编程方式生成。主模型生成过程有效地将常规配置数据转换为归一化配置数据。归一化的配置数据允许用户例如进行比较产品配置。在一个实施例中，归一化模型生成过程生成表示产品的属性和归一化特征的归一化数据模型。在一个实施例中，然后将归一化配置数据模型添加到在运行时情境配置分析期间使用的存储器内数据结构，从而减少由于消除重复和有效使用搜索结构而导致的效率的保留的数据项的总数。然后，归一化配置数据模型的内存中表示可以序列化为磁盘，作为要在部署中运行时使用的要加载的文件。

5. 发明授权

US09195714B1 Identifying potential duplicates of a document in a document corpus 有权
标题翻译：在文档语料库中识别文档的潜在重复项
公开(公告)号：US09195714B1
公开(公告)日：2015-11-24
申请号：US13030114
申请日：2011-02-17
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
IPC分类号： G06F17/30
CPC分类号： G06F17/30528 , G06F17/30483 , G06F17/3071
摘要： According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document, is provided. A source document is obtained. A list of queries corresponding to the source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
摘要翻译：根据所公开的主题的方面，提供了一种用于从文档语料库中识别源文档的潜在重复的一组文档的方法。得到一个源文件。识别与源文档相对应的查询的列表。在所识别的查询列表中的每个查询在文档语料库上执行，其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。对于每个结果集中识别的每个文档，根据识别的文档在其结果集中的序数位置，为所识别的文档生成文档分数。根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。识别的文档的所选子集被存储或显示。

6. 发明授权

US08744931B1 Method and apparatus for inventory searching 有权
标题翻译：库存搜索的方法和装置
公开(公告)号：US08744931B1
公开(公告)日：2014-06-03
申请号：US13571602
申请日：2012-08-10
申请人： Grant M. Emery , Arpan Shah
发明人： Grant M. Emery , Arpan Shah
IPC分类号： G06Q10/00 , G06Q30/00
CPC分类号： G06Q10/087 , G06Q30/0633
摘要： A method is disclosed that includes identifying an inventory item corresponding to a product configuration. The product configuration is defined using a feature map. The inventory item is also defined using the feature map. Each entry of the feature map corresponds to one of a number of features of a product.
摘要翻译：公开了一种包括识别与产品配置相对应的库存物品的方法。产品配置使用特征图进行定义。库存项目也使用特征图定义。特征图的每个条目对应于产品的许多特征之一。

7. 发明授权

US08244604B1 Method and apparatus for inventory searching 有权
标题翻译：库存搜索的方法和装置
公开(公告)号：US08244604B1
公开(公告)日：2012-08-14
申请号：US12749803
申请日：2010-03-30
申请人： Grant M. Emery , Arpan Shah
发明人： Grant M. Emery , Arpan Shah
IPC分类号： G06Q10/00 , G06Q30/00
CPC分类号： G06Q10/087 , G06Q30/0633
摘要： A method is disclosed that includes identifying an inventory item corresponding to a product configuration. The product configuration is defined using a feature map. The inventory item is also defined using the feature map. Each entry of the feature map corresponds to one of a number of features of a product.
摘要翻译：公开了一种包括识别与产品配置相对应的库存物品的方法。产品配置使用特征图进行定义。库存项目也使用特征图定义。特征图的每个条目对应于产品的许多特征之一。

8. 发明授权

US07970773B1 Determining variation sets among product descriptions 有权
标题翻译：确定产品说明中的变体集
公开(公告)号：US07970773B1
公开(公告)日：2011-06-28
申请号：US11863020
申请日：2007-09-27
申请人： Srikanth Thirumalai , Aswath Manoharan , Xiaoxin Yin , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Xiaoxin Yin , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00
CPC分类号： G06F17/2211 , Y10S707/917
摘要： Systems and methods for determining a set of variation-phrases from a collection of documents in a document corpus is presented. Potential variation-phrase pairs among the various documents in the document corpus are identified. The identified potential variation-phrase pairs are then added to a variation-phrase set. The potential variation-phrase pairs in the variation-phrase set are filtered to remove those potential variation-phrase pairs that do not satisfy a predetermined criteria. After filtering the variation-phrase set, the resulting variation-phrase set is stored in a data store.
摘要翻译：提出了用于从文档语料库中的文档集合确定一组变体词组的系统和方法。识别文档语料库中的各种文档之间的潜在的变化 - 短语对。然后将所识别的潜在变异短语对添加到变化短语集合中。对变化短语组中的潜在的变体 - 短语对进行过滤以去除不满足预定标准的那些潜在的变体 - 短语对。在对变化短语组进行过滤之后，将所得到的变化短语组存储在数据存储器中。

9. 发明授权

US07908279B1 Filtering invalid tokens from a document using high IDF token filtering 有权
标题翻译：使用高IDF令牌过滤从文档过滤无效令牌
公开(公告)号：US07908279B1
公开(公告)日：2011-03-15
申请号：US11856581
申请日：2007-09-17
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00 , G06F17/30 , G06F17/21 , G06F9/445
CPC分类号： G06F17/2211 , Y10S707/917
摘要： Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.
摘要翻译：描述用于从文档过滤标记以确定文档是否描述与另一文档相比基本相似的主题的系统和方法。在一个实施例中，获得第一文档。该文档被组织成多个字段，并且至少一些字段包括表示文档描述的主题的令牌。选择该文档的字段，并且选择具有最高逆文档频率（IDF）的所选字段内的令牌。删除IDF高于所选令牌的令牌。使用剩余的令牌，确定第一文档是否描述与第二文档描述的主题相当的主题。提供关于第一文档是否根据确定描述与第二文档描述的主题相当的主题的指示。

10. 发明授权

US07904462B1 Comparison engine for identifying documents describing similar subject matter 有权
标题翻译：用于识别描述相似主题的文档的比较引擎
公开(公告)号：US07904462B1
公开(公告)日：2011-03-08
申请号：US11953726
申请日：2007-12-10
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00 , G06F17/00
CPC分类号： G06Q30/06
摘要： Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
摘要翻译：用于确定第一文档是否是第二文档的潜在副本的系统和方法，使得两个文档描述相同或基本相同的主题，其中第一和第二文档包括属性字段中的属性数据。获得一组用于确定第一文档是否是第二文档的潜在副本的规则。此外，对于该组规则中的每个规则，确定第一文档的第一组属性中的数据是否包含在第二文档的第二组属性中。根据规则集中评估规则的结果，确定第一个文档是否是第二个文档的潜在副本。如果根据规则集中的评估规则，确定第一文档是第二文档的潜在副本，则将第一文档的引用存储在第二文档的一组潜在重复项中。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式