专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US09195714B1 Identifying potential duplicates of a document in a document corpus 有权
标题翻译：在文档语料库中识别文档的潜在重复项
公开(公告)号：US09195714B1
公开(公告)日：2015-11-24
申请号：US13030114
申请日：2011-02-17
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan
IPC分类号： G06F17/30
CPC分类号： G06F17/30528 , G06F17/30483 , G06F17/3071
摘要： According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document, is provided. A source document is obtained. A list of queries corresponding to the source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
摘要翻译：根据所公开的主题的方面，提供了一种用于从文档语料库中识别源文档的潜在重复的一组文档的方法。得到一个源文件。识别与源文档相对应的查询的列表。在所识别的查询列表中的每个查询在文档语料库上执行，其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。对于每个结果集中识别的每个文档，根据识别的文档在其结果集中的序数位置，为所识别的文档生成文档分数。根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。识别的文档的所选子集被存储或显示。

2. 发明授权

US07970773B1 Determining variation sets among product descriptions 有权
标题翻译：确定产品说明中的变体集
公开(公告)号：US07970773B1
公开(公告)日：2011-06-28
申请号：US11863020
申请日：2007-09-27
申请人： Srikanth Thirumalai , Aswath Manoharan , Xiaoxin Yin , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Xiaoxin Yin , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00
CPC分类号： G06F17/2211 , Y10S707/917
摘要： Systems and methods for determining a set of variation-phrases from a collection of documents in a document corpus is presented. Potential variation-phrase pairs among the various documents in the document corpus are identified. The identified potential variation-phrase pairs are then added to a variation-phrase set. The potential variation-phrase pairs in the variation-phrase set are filtered to remove those potential variation-phrase pairs that do not satisfy a predetermined criteria. After filtering the variation-phrase set, the resulting variation-phrase set is stored in a data store.
摘要翻译：提出了用于从文档语料库中的文档集合确定一组变体词组的系统和方法。识别文档语料库中的各种文档之间的潜在的变化 - 短语对。然后将所识别的潜在变异短语对添加到变化短语集合中。对变化短语组中的潜在的变体 - 短语对进行过滤以去除不满足预定标准的那些潜在的变体 - 短语对。在对变化短语组进行过滤之后，将所得到的变化短语组存储在数据存储器中。

3. 发明申请

US20080263090A1 Enabling Interactive Integration of Network-Accessible Applications in a Content Aggregation Framework 有权
标题翻译：实现网络可访问应用程序在内容聚合框架中的交互式集成
公开(公告)号：US20080263090A1
公开(公告)日：2008-10-23
申请号：US12169640
申请日：2008-07-09
申请人： Amber Roy-Chowdhury , Srikanth Thirumalai
发明人： Amber Roy-Chowdhury , Srikanth Thirumalai
IPC分类号： G06F17/00
CPC分类号： G06Q30/0613 , H04L67/02
摘要： Enabling network-accessible applications to be integrated into content aggregation frameworks (such as portals) and to become dynamically interactive through proxying components (such as proxying portlets), thereby providing run-time cooperation and data sharing.
摘要翻译：使网络可访问应用程序能够集成到内容集成框架（如门户）中，并通过代理组件（如代理portlet）进行动态交互，从而提供运行时协作和数据共享。

4. 发明授权

US08838618B1 System and method for identifying feature phrases in item description information 有权
标题翻译：用于在项目描述信息中识别特征短语的系统和方法
公开(公告)号：US08838618B1
公开(公告)日：2014-09-16
申请号：US13175124
申请日：2011-07-01
申请人： Jianhui Wu , Nicholas R. Boyd , Srikanth Thirumalai
发明人： Jianhui Wu , Nicholas R. Boyd , Srikanth Thirumalai
IPC分类号： G06F17/30
CPC分类号： G06F17/30625
摘要： Embodiments may include, for each item in a subset of items from a larger group of items, evaluating item description information about that item to identify a respective set of candidate phrases to be evaluated. Embodiments may also include, for each phrase in the sets of candidate phrases, generating multiple component scores based on one or more of the frequency with which that phrase occurs in the item description information for the subset of items and/or the frequency with which that phrase occurs in a corpus of item description information for the overall group of items. Embodiments may also include, for each phrase in the sets of candidate phrases, generating a respective phrase score based on the component scores generated for that phrase. Embodiments may include, based on phrase scores, selecting a subset of phrases from the sets of candidate phrases as being feature phrases for the subset of items.
摘要翻译：实施例可以包括对于来自较大组项目的项目的子集中的每个项目，评估关于该项目的项目描述信息以标识要评估的各个候选短语集合。实施例还可以包括对于候选短语集合中的每个短语，基于在该项目的子集的项目描述信息中出现该短语的频率中的一个或多个频率和/或该频率的频率，生成多个分量分数短语发生在整个项目组的项目描述信息的语料库中。对于候选短语集合中的每个短语，实施例还可以包括基于为该短语生成的分数分数来生成相应的短语分数。实施例可以包括基于短语分数，从候选短语集合中选择短语的子集作为项目子集的特征短语。

5. 发明授权

US08577814B1 System and method for genetic creation of a rule set for duplicate detection 有权
标题翻译：遗传创建用于重复检测的规则集的系统和方法
公开(公告)号：US08577814B1
公开(公告)日：2013-11-05
申请号：US13193285
申请日：2011-07-28
申请人： Jianhui Wu , Srikanth Thirumalai
发明人： Jianhui Wu , Srikanth Thirumalai
IPC分类号： G06F15/18
CPC分类号： G06N3/126
摘要： Embodiments may generate a population of candidate rules including multiple rule conditions for detecting duplicates, each duplicate representing different sets of item description information that describe a common item. For each candidate rule of the population, embodiments may apply that rule to a reference data set including known duplicates and non-duplicates. Embodiments may assign each candidate rule a fitness score generated with a fitness function based on the performance of that candidate rule. Embodiments may, based on the fitness scores, select a subset of the population of candidate rules as parents for the new generation of candidate rules. Embodiments may perform crossover and/or mutation operations on the parent candidate rules to generate the new generation of candidate rules. Embodiments may select from the new generation of candidate rules (or from subsequent generations of candidate rules), rules for inclusion within a rule set for detecting duplicates within item description information.
摘要翻译：实施例可以生成包括用于检测重复的多个规则条件的候选规则的总体，每个复制代表描述公共项目的不同的项目描述信息集合。对于群体的每个候选规则，实施例可以将该规则应用于包括已知重复项和非重复项的参考数据集。实施例可以基于该候选规则的执行来为每个候选规则分配用适应度函数生成的适合度分数。实施例可以基于适应度分数来选择候选规则的群体的子集作为新一代候选规则的父母。实施例可以对父候选规则执行交叉和/或变异操作以生成候选规则的新一代。实施例可以从新一代的候选规则（或从后续的候选规则）中选择用于包含在用于检测项目描述信息内的重复项的规则集中的规则。

6. 发明授权

US08527475B1 System and method for identifying structured data items lacking requisite information for rule-based duplicate detection 有权
标题翻译：用于识别缺少基于规则的重复检测所需信息的结构化数据项的系统和方法
公开(公告)号：US08527475B1
公开(公告)日：2013-09-03
申请号：US13239068
申请日：2011-09-21
申请人： Roshan Ram Rammohan , Madhu M Kurup , Srikanth Thirumalai
发明人： Roshan Ram Rammohan , Madhu M Kurup , Srikanth Thirumalai
IPC分类号： G06F17/30
CPC分类号： G06F17/30489
摘要： Embodiments of a system and method for identifying structured data items lacking requisite information for rule-based duplicate detection are described. Embodiments may include generating a deficiency score for each of multiple structured data items including applying a set of rules based on duplicate detection techniques to each given structured data item in order to perform a comparison of the given structured data item to itself. The deficiency score of the given structured data item may be based on a result of the comparison. Embodiments may also include, based on the deficiency scores of the structured data items, identifying one or more deficient structured data items having less than a requisite quantity of information for performing duplicate detection on structured data items. Embodiments may also include identifying one or more key attributes missing from some of the one or more deficient structured data items and requesting those key attributes.
摘要翻译：描述了用于识别缺少用于基于规则的重复检测的必需信息的结构化数据项的系统和方法的实施例。实施例可以包括为多个结构化数据项中的每一个生成不足分数，包括将基于重复检测技术的一组规则应用于每个给定结构化数据项，以便执行给定结构化数据项与其自身的比较。给定结构化数据项目的不足分数可以基于比较的结果。实施例还可以基于结构化数据项的不足分数来识别具有小于必要数量的信息的一个或多个缺陷结构化数据项，以对结构化数据项执行重复检测。实施例还可以包括识别从一个或多个缺陷结构化数据项中的一些缺失的一个或多个关键属性，并请求这些关键属性。

7. 发明授权

US08438291B2 Managing web tier session state objects in a content delivery network (CDN) 有权
标题翻译：管理内容传送网络（CDN）中的Web层会话状态对象
公开(公告)号：US08438291B2
公开(公告)日：2013-05-07
申请号：US12843278
申请日：2010-07-26
申请人： Andrew T. Davis , Jay G. Parikh , Srikanth Thirumalai , William E. Weihl , Mark Tsimelzon
发明人： Andrew T. Davis , Jay G. Parikh , Srikanth Thirumalai , William E. Weihl , Mark Tsimelzon
IPC分类号： G06F15/16 , G06F15/177 , G06F15/173
CPC分类号： H04L67/1095 , G06F11/203 , G06F11/2097 , H04L67/14 , H04L67/142 , H04L67/148
摘要： Business applications running on a content delivery network (CDN) having a distributed application framework can create, access and modify state for each client. Over time, a single client may desire to access a given application on different CDN edge servers within the same region and even across different regions. Each time, the application may need to access the latest “state” of the client even if the state was last modified by an application on a different server. A difficulty arises when a process or a machine that last modified the state dies or is temporarily or permanently unavailable. The present invention provides techniques for migrating session state data across CDN servers in a manner transparent to the user. A distributed application thus can access a latest “state” of a client even if the state was last modified by an application instance executing on a different CDN server, including a nearby (in-region) or a remote (out-of-region) server.
摘要翻译：在具有分布式应用程序框架的内容传送网络（CDN）上运行的业务应用程序可以为每个客户端创建，访问和修改状态。随着时间的推移，单个客户端可能希望访问同一区域内甚至跨不同区域的不同CDN边缘服务器上的给定应用。每次应用程序可能需要访问客户端的最新“状态”，即使该状态最后被不同服务器上的应用程序修改。当最后修改状态的过程或机器死亡或临时或永久不可用时，会出现困难。本发明提供了以对用户透明的方式跨CDN服务器迁移会话状态数据的技术。因此，分布式应用程序可以访问客户端的最新“状态”，即使状态最后由在不同的CDN服务器上执行的应用程序实例进行修改，包括附近（区域内）或远程（区域外）服务器。

8. 发明授权

US07908279B1 Filtering invalid tokens from a document using high IDF token filtering 有权
标题翻译：使用高IDF令牌过滤从文档过滤无效令牌
公开(公告)号：US07908279B1
公开(公告)日：2011-03-15
申请号：US11856581
申请日：2007-09-17
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00 , G06F17/30 , G06F17/21 , G06F9/445
CPC分类号： G06F17/2211 , Y10S707/917
摘要： Systems and methods for filtering tokens from a document for determining whether the document describes substantially similar subject matter compared to another document are described. In one embodiment, a first document is obtained. This document is organized into a plurality of fields, and at least some of the fields include tokens representing the subject matter described by the document. A field of this document is selected and a token from within the selected field having the highest inverse document frequency (IDF) is selected. Those tokens that have a higher IDF than the selected token are removed. Using the remaining tokens, a determination is made as to whether the first document describes substantially similar subject matter to the subject matter described by a second document. An indication is provided as to whether the first document describes substantially similar subject matter to that described by a second document according to the determination.
摘要翻译：描述用于从文档过滤标记以确定文档是否描述与另一文档相比基本相似的主题的系统和方法。在一个实施例中，获得第一文档。该文档被组织成多个字段，并且至少一些字段包括表示文档描述的主题的令牌。选择该文档的字段，并且选择具有最高逆文档频率（IDF）的所选字段内的令牌。删除IDF高于所选令牌的令牌。使用剩余的令牌，确定第一文档是否描述与第二文档描述的主题相当的主题。提供关于第一文档是否根据确定描述与第二文档描述的主题相当的主题的指示。

9. 发明授权

US07904462B1 Comparison engine for identifying documents describing similar subject matter 有权
标题翻译：用于识别描述相似主题的文档的比较引擎
公开(公告)号：US07904462B1
公开(公告)日：2011-03-08
申请号：US11953726
申请日：2007-12-10
申请人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
发明人： Srikanth Thirumalai , Aswath Manoharan , Mark J. Tomko , Grant M. Emery , Vijai Mohan , Egidio Terra
IPC分类号： G06F7/00 , G06F17/00
CPC分类号： G06Q30/06
摘要： Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
摘要翻译：用于确定第一文档是否是第二文档的潜在副本的系统和方法，使得两个文档描述相同或基本相同的主题，其中第一和第二文档包括属性字段中的属性数据。获得一组用于确定第一文档是否是第二文档的潜在副本的规则。此外，对于该组规则中的每个规则，确定第一文档的第一组属性中的数据是否包含在第二文档的第二组属性中。根据规则集中评估规则的结果，确定第一个文档是否是第二个文档的潜在副本。如果根据规则集中的评估规则，确定第一文档是第二文档的潜在副本，则将第一文档的引用存储在第二文档的一组潜在重复项中。

10. 发明授权

US07877465B2 Providing artifact and configuration cohesion across disparate portal application models 失效
标题翻译：在不同的门户应用模型中提供工件和配置的凝聚力
公开(公告)号：US07877465B2
公开(公告)日：2011-01-25
申请号：US10891287
申请日：2004-07-14
申请人： Prasant K. Kontamsetty , Srikanth Thirumalai , Michael C. Wanderski
发明人： Prasant K. Kontamsetty , Srikanth Thirumalai , Michael C. Wanderski
IPC分类号： G06F15/177
CPC分类号： G06F17/24
摘要： Under the present invention, a client-based editor is launched (e.g., from a web server or the like) within a client interface such as a browser. Upon being launched, initial configuration parameters are passed from a portal server to the editor. The present invention also provides a “communications tunnel” between the editor and the portal server in the form of a portlet interface on the web server. This is so that any characteristics expressed by the portal server (e.g., changes to the initial configuration parameters) can be pushed to the editor. Moreover, the portlet interface allows the editor to query the portal server to obtain any needed services (e.g. a spreadsheet computation).
摘要翻译：在本发明的基础上，在诸如浏览器的客户端界面中启动基于客户端的编辑器（例如，从web服务器等）。启动后，初始配置参数从门户服务器传递到编辑器。本发明还以Web服务器上的Portlet接口的形式提供编辑器和门户服务器之间的“通信隧道”。这使得门户服务器表达的任何特征（例如，对初始配置参数的改变）都可以被推送到编辑器。此外，portlet接口允许编辑器查询门户服务器以获得任何所需的服务（例如电子表格计算）。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式