会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明授权
    • Entity summarization and comparison
    • 实体总结与比较
    • US09251249B2
    • 2016-02-02
    • US13316838
    • 2011-12-12
    • Pavel DmitrievWei Zhuang
    • Pavel DmitrievWei Zhuang
    • G06F17/30
    • G06F17/30616G06F17/30687
    • An entity summarization system is described herein that mines the Internet and other data source to provide answers to questions such as the relative sentiment of users towards various brands. The system uses a controlled vocabulary list describing a specific aspect of entities of interest. Given an entity name, the system scans the whole content corpus to collect statistics on the words that occur most frequently in the context of the entity name, taking into account proximity information, to produce a weighted list of vocabulary terms describing the entity. Two entities can be compared by normalizing and comparing their weighted term lists. In some embodiments, the system performs these procedures efficiently by leveraging an N-gram web model. Thus, the system provides an automated way to compare two entities to derive information about how users feel about the entities at any given time.
    • 本文描述了一种实体摘要系统,它利用互联网和其他数据源,为诸如用户对各种品牌的相对情绪等问题提供答案。 该系统使用描述感兴趣实体的特定方面的受控词汇表。 给定一个实体名称,系统扫描整个内容语料库以收集关于在实体名称的上下文中最频繁出现的单词的统计数据,同时考虑到邻近信息,以产生描述实体的词汇术语的加权列表。 通过对其加权项列表进行归一化和比较,可以比较两个实体。 在一些实施例中,系统通过利用N-gram web模型有效地执行这些过程。 因此,该系统提供了一种自动化的方式来比较两个实体以得出关于用户在任何给定时间对实体的感觉的信息。
    • 5. 发明申请
    • ENTITY SUMMARIZATION AND COMPARISON
    • 实体概述和比较
    • US20130151538A1
    • 2013-06-13
    • US13316838
    • 2011-12-12
    • Pavel DmitrievWei Zhuang
    • Pavel DmitrievWei Zhuang
    • G06F17/30
    • G06F17/30616G06F17/30687
    • An entity summarization system is described herein that mines the Internet and other data source to provide answers to questions such as the relative sentiment of users towards various brands. The system uses a controlled vocabulary list describing a specific aspect of entities of interest. Given an entity name, the system scans the whole content corpus to collect statistics on the words that occur most frequently in the context of the entity name, taking into account proximity information, to produce a weighted list of vocabulary terms describing the entity. Two entities can be compared by normalizing and comparing their weighted term lists. In some embodiments, the system performs these procedures efficiently by leveraging an N-gram web model. Thus, the system provides an automated way to compare two entities to derive information about how users feel about the entities at any given time.
    • 本文描述了一种实体摘要系统,它利用互联网和其他数据源,为诸如用户对各种品牌的相对情绪等问题提供答案。 该系统使用描述感兴趣实体的特定方面的受控词汇表。 给定一个实体名称,系统扫描整个内容语料库以收集关于在实体名称的上下文中最频繁出现的单词的统计数据,同时考虑到邻近信息,以产生描述实体的词汇术语的加权列表。 通过对其加权项列表进行归一化和比较,可以比较两个实体。 在一些实施例中,系统通过利用N-gram web模型有效地执行这些过程。 因此,该系统提供了一种自动化的方式来比较两个实体以得出关于用户在任何给定时间对实体的感觉的信息。
    • 6. 发明申请
    • HOST-BASED SEED SELECTION ALGORITHM FOR WEB CRAWLERS
    • 基于主机的网络选择算法
    • US20100114858A1
    • 2010-05-06
    • US12259164
    • 2008-10-27
    • Pavel Dmitriev
    • Pavel Dmitriev
    • G06F17/30
    • G06F16/9537
    • A host-based seed selection process considers factors such as quality, importance and potential yield of hosts in a decision to use a document of a host as a seed. A subset of a plurality of hosts is determined, including some but not all of the plurality of the hosts, according to an indication of importance of the hosts, according to an expected yield of new documents for the hosts, and according to preferences for the markets the hosts belong to. At least one seed is generated for each host of the determined subset of hosts, wherein each generated at least one seed includes an indication of a document in the linked database of documents. The generated seeds are provided to be accessible by a database crawler.
    • 基于主机的种子选择过程在决定使用主机的文档作为种子时考虑诸如主机的质量,重要性和潜在产量等因素。 根据主机的重要性的指示,根据主机的新文档的预期收益,并且根据主机的偏好,确定多个主机的子集,包括多个主机中的一些而不是全部主机 销售主机属于。 为所确定的主机子集的每个主机生成至少一个种子,其中每个生成的至少一个种子包括链接的文档数据库中的文档的指示。 生成的种子被提供给数据库抓取工具可访问。