会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Identifying potential duplicates of a document in a document corpus
    • 在文档语料库中识别文档的潜在重复项
    • US09195714B1
    • 2015-11-24
    • US13030114
    • 2011-02-17
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai Mohan
    • G06F17/30
    • G06F17/30528G06F17/30483G06F17/3071
    • According to aspects of the disclosed subject matter, a method for identifying a set of documents from a document corpus that are potential duplicates of a source document, is provided. A source document is obtained. A list of queries corresponding to the source document is identified. Each query in the identified list of queries is executed on the document corpus, wherein the execution of each query yields a corresponding results set identifying an ordered set of documents in the document corpus. For each document identified in each results set, a document score is generated for the identified document based on the identified document's ordinal position in its results set. A subset of the identified documents of the results set is selected according to the generated document scores that satisfy predetermined selection criteria. The selected subset of identified documents are stored or displayed.
    • 根据所公开的主题的方面,提供了一种用于从文档语料库中识别源文档的潜在重复的一组文档的方法。 得到一个源文件。 识别与源文档相对应的查询的列表。 在所识别的查询列表中的每个查询在文档语料库上执行,其中每个查询的执行产生标识文档语料库中的有序文档集合的相应结果集。 对于每个结果集中识别的每个文档,根据识别的文档在其结果集中的序数位置,为所识别的文档生成文档分数。 根据满足预定选择标准的所生成的文档分数来选择结果集的识别文档的子集。 识别的文档的所选子集被存储或显示。
    • 4. 发明授权
    • System and method for identifying feature phrases in item description information
    • 用于在项目描述信息中识别特征短语的系统和方法
    • US08838618B1
    • 2014-09-16
    • US13175124
    • 2011-07-01
    • Jianhui WuNicholas R. BoydSrikanth Thirumalai
    • Jianhui WuNicholas R. BoydSrikanth Thirumalai
    • G06F17/30
    • G06F17/30625
    • Embodiments may include, for each item in a subset of items from a larger group of items, evaluating item description information about that item to identify a respective set of candidate phrases to be evaluated. Embodiments may also include, for each phrase in the sets of candidate phrases, generating multiple component scores based on one or more of the frequency with which that phrase occurs in the item description information for the subset of items and/or the frequency with which that phrase occurs in a corpus of item description information for the overall group of items. Embodiments may also include, for each phrase in the sets of candidate phrases, generating a respective phrase score based on the component scores generated for that phrase. Embodiments may include, based on phrase scores, selecting a subset of phrases from the sets of candidate phrases as being feature phrases for the subset of items.
    • 实施例可以包括对于来自较大组项目的项目的子集中的每个项目,评估关于该项目的项目描述信息以标识要评估的各个候选短语集合。 实施例还可以包括对于候选短语集合中的每个短语,基于在该项目的子集的项目描述信息中出现该短语的频率中的一个或多个频率和/或该频率的频率,生成多个分量分数 短语发生在整个项目组的项目描述信息的语料库中。 对于候选短语集合中的每个短语,实施例还可以包括基于为该短语生成的分数分数来生成相应的短语分数。 实施例可以包括基于短语分数,从候选短语集合中选择短语的子集作为项目子集的特征短语。
    • 5. 发明授权
    • System and method for genetic creation of a rule set for duplicate detection
    • 遗传创建用于重复检测的规则集的系统和方法
    • US08577814B1
    • 2013-11-05
    • US13193285
    • 2011-07-28
    • Jianhui WuSrikanth Thirumalai
    • Jianhui WuSrikanth Thirumalai
    • G06F15/18
    • G06N3/126
    • Embodiments may generate a population of candidate rules including multiple rule conditions for detecting duplicates, each duplicate representing different sets of item description information that describe a common item. For each candidate rule of the population, embodiments may apply that rule to a reference data set including known duplicates and non-duplicates. Embodiments may assign each candidate rule a fitness score generated with a fitness function based on the performance of that candidate rule. Embodiments may, based on the fitness scores, select a subset of the population of candidate rules as parents for the new generation of candidate rules. Embodiments may perform crossover and/or mutation operations on the parent candidate rules to generate the new generation of candidate rules. Embodiments may select from the new generation of candidate rules (or from subsequent generations of candidate rules), rules for inclusion within a rule set for detecting duplicates within item description information.
    • 实施例可以生成包括用于检测重复的多个规则条件的候选规则的总体,每个复制代表描述公共项目的不同的项目描述信息集合。 对于群体的每个候选规则,实施例可以将该规则应用于包括已知重复项和非重复项的参考数据集。 实施例可以基于该候选规则的执行来为每个候选规则分配用适应度函数生成的适合度分数。 实施例可以基于适应度分数来选择候选规则的群体的子集作为新一代候选规则的父母。 实施例可以对父候选规则执行交叉和/或变异操作以生成候选规则的新一代。 实施例可以从新一代的候选规则(或从后续的候选规则)中选择用于包含在用于检测项目描述信息内的重复项的规则集中的规则。
    • 6. 发明授权
    • System and method for identifying structured data items lacking requisite information for rule-based duplicate detection
    • 用于识别缺少基于规则的重复检测所需信息的结构化数据项的系统和方法
    • US08527475B1
    • 2013-09-03
    • US13239068
    • 2011-09-21
    • Roshan Ram RammohanMadhu M KurupSrikanth Thirumalai
    • Roshan Ram RammohanMadhu M KurupSrikanth Thirumalai
    • G06F17/30
    • G06F17/30489
    • Embodiments of a system and method for identifying structured data items lacking requisite information for rule-based duplicate detection are described. Embodiments may include generating a deficiency score for each of multiple structured data items including applying a set of rules based on duplicate detection techniques to each given structured data item in order to perform a comparison of the given structured data item to itself. The deficiency score of the given structured data item may be based on a result of the comparison. Embodiments may also include, based on the deficiency scores of the structured data items, identifying one or more deficient structured data items having less than a requisite quantity of information for performing duplicate detection on structured data items. Embodiments may also include identifying one or more key attributes missing from some of the one or more deficient structured data items and requesting those key attributes.
    • 描述了用于识别缺少用于基于规则的重复检测的必需信息的结构化数据项的系统和方法的实施例。 实施例可以包括为多个结构化数据项中的每一个生成不足分数,包括将基于重复检测技术的一组规则应用于每个给定结构化数据项,以便执行给定结构化数据项与其自身的比较。 给定结构化数据项目的不足分数可以基于比较的结果。 实施例还可以基于结构化数据项的不足分数来识别具有小于必要数量的信息的一个或多个缺陷结构化数据项,以对结构化数据项执行重复检测。 实施例还可以包括识别从一个或多个缺陷结构化数据项中的一些缺失的一个或多个关键属性,并请求这些关键属性。
    • 7. 发明授权
    • Managing web tier session state objects in a content delivery network (CDN)
    • 管理内容传送网络(CDN)中的Web层会话状态对象
    • US08438291B2
    • 2013-05-07
    • US12843278
    • 2010-07-26
    • Andrew T. DavisJay G. ParikhSrikanth ThirumalaiWilliam E. WeihlMark Tsimelzon
    • Andrew T. DavisJay G. ParikhSrikanth ThirumalaiWilliam E. WeihlMark Tsimelzon
    • G06F15/16G06F15/177G06F15/173
    • H04L67/1095G06F11/203G06F11/2097H04L67/14H04L67/142H04L67/148
    • Business applications running on a content delivery network (CDN) having a distributed application framework can create, access and modify state for each client. Over time, a single client may desire to access a given application on different CDN edge servers within the same region and even across different regions. Each time, the application may need to access the latest “state” of the client even if the state was last modified by an application on a different server. A difficulty arises when a process or a machine that last modified the state dies or is temporarily or permanently unavailable. The present invention provides techniques for migrating session state data across CDN servers in a manner transparent to the user. A distributed application thus can access a latest “state” of a client even if the state was last modified by an application instance executing on a different CDN server, including a nearby (in-region) or a remote (out-of-region) server.
    • 在具有分布式应用程序框架的内容传送网络(CDN)上运行的业务应用程序可以为每个客户端创建,访问和修改状态。 随着时间的推移,单个客户端可能希望访问同一区域内甚至跨不同区域的不同CDN边缘服务器上的给定应用。 每次应用程序可能需要访问客户端的最新“状态”,即使该状态最后被不同服务器上的应用程序修改。 当最后修改状态的过程或机器死亡或临时或永久不可用时,会出现困难。 本发明提供了以对用户透明的方式跨CDN服务器迁移会话状态数据的技术。 因此,分布式应用程序可以访问客户端的最新“状态”,即使状态最后由在不同的CDN服务器上执行的应用程序实例进行修改,包括附近(区域内)或远程(区域外) 服务器。
    • 9. 发明授权
    • Comparison engine for identifying documents describing similar subject matter
    • 用于识别描述相似主题的文档的比较引擎
    • US07904462B1
    • 2011-03-08
    • US11953726
    • 2007-12-10
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • Srikanth ThirumalaiAswath ManoharanMark J. TomkoGrant M. EmeryVijai MohanEgidio Terra
    • G06F7/00G06F17/00
    • G06Q30/06
    • Systems and methods for determining whether a first document is a potential duplicate of a second document such that the two documents describe the same or substantially the same subject matter, wherein the first and second documents include attribute data in attribute fields. A set of rules is obtained for determining whether the first document is a potential duplicate of the second document. Moreover, for each rule in the set of rules, a determination is made as to whether data in a first set of attributes of the first document is contained in a second set of attributes of the second document. According to the results of the evaluated rules in the rules set, determining whether the first document is a potential duplicate of the second document. If, according to the evaluated rules in the rules set, the first document is determined to be a potential duplicate of the second document, storing a reference to the first document in a set of potential duplicates of the second document.
    • 用于确定第一文档是否是第二文档的潜在副本的系统和方法,使得两个文档描述相同或基本相同的主题,其中第一和第二文档包括属性字段中的属性数据。 获得一组用于确定第一文档是否是第二文档的潜在副本的规则。 此外,对于该组规则中的每个规则,确定第一文档的第一组属性中的数据是否包含在第二文档的第二组属性中。 根据规则集中评估规则的结果,确定第一个文档是否是第二个文档的潜在副本。 如果根据规则集中的评估规则,确定第一文档是第二文档的潜在副本,则将第一文档的引用存储在第二文档的一组潜在重复项中。