会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明申请
    • NAMED ENTITY RESOLUTION USING MULTIPLE TEXT SOURCES
    • 使用多个文本来源的NAMED实体分辨率
    • US20100094831A1
    • 2010-04-15
    • US12251452
    • 2008-10-14
    • Matthew F. Hurst
    • Matthew F. Hurst
    • G06F7/06G06F17/30
    • G06F17/278
    • An arrangement for resolving ambiguity among named entities in web based text documents is provided in which multiple documents are utilized that are of different genres and will thus typically use different degrees of precision when referring to named entities. When an ambiguous named entity is located in a document, any links contained in that document are followed to other documents. If a linked document includes a named entity that is fully specified (i.e., includes both a first and last name), then this information can be used to resolve the ambiguity of the named entity in the original document.
    • 提供了一种用于解决基于Web的文本文档中的命名实体之间的歧义的布置,其中使用不同类型的多个文档,并且因此在引用命名实体时通常会使用不同的精度。 当一个不明确的命名实体位于文档中时,该文档中包含的任何链接都将跟随到其他文档。 如果链接的文档包括完全指定的命名实体(即,包括第一个和最后一个名称),则该信息可以用于解决原始文档中的命名实体的模糊性。
    • 5. 发明申请
    • METHOD AND APPARATUS FOR WEB CRAWLING
    • 网络破解的方法和设备
    • US20100250516A1
    • 2010-09-30
    • US12413528
    • 2009-03-28
    • Alexey MaykovMatthew F. Hurst
    • Alexey MaykovMatthew F. Hurst
    • G06F7/10G06F17/30G06F7/08G06F9/46
    • G06F17/30864
    • A method and system for retrieving data from a webpage is described herein. A scheduler organizes, or rather orders, a group of webpage identifiers according to some predetermined criteria. Based upon this ordering, a fetcher may be configured to fetch data from webpages identified by the identifiers. To promote efficiency and reduce the latency between when a webpage is updated and when the fetcher retrieves data from the webpage, the scheduler may be configured to reorder the identifiers in such a manner that it causes an identifier that was less relevant, and would not have been sent to the fetcher, to become more relevant. In this way, the method and system may be particularly useful for retrieving data related to webpages that are updated frequently, such as social media webpages, for example.
    • 本文描述了用于从网页检索数据的方法和系统。 调度器根据某些预定标准来组织或者相当地命令一组网页标识符。 基于该排序,提取器可以被配置为从由标识符标识的网页获取数据。 为了提高效率并减少网页更新时和提取器从网页检索数据之间的延迟,调度器可以被配置为以这样的方式重新排序标识符,使得它导致不相关的标识符,并且不会 被发送到提取者,变得更加相关。 以这种方式,该方法和系统可能特别适用于检索与频繁更新的网页相关的数据,例如社交媒体网页。
    • 9. 发明授权
    • Method and apparatus for web crawling
    • 网络爬行的方法和装置
    • US08712992B2
    • 2014-04-29
    • US12413528
    • 2009-03-28
    • Alexey MaykovMatthew F. Hurst
    • Alexey MaykovMatthew F. Hurst
    • G06F7/00G06F7/08
    • G06F17/30864
    • A method and system for retrieving data from a webpage is described herein. A scheduler organizes, or rather orders, a group of webpage identifiers according to some predetermined criteria. Based upon this ordering, a fetcher may be configured to fetch data from webpages identified by the identifiers. To promote efficiency and reduce the latency between when a webpage is updated and when the fetcher retrieves data from the webpage, the scheduler may be configured to reorder the identifiers in such a manner that it causes an identifier that was less relevant, and would not have been sent to the fetcher, to become more relevant. In this way, the method and system may be particularly useful for retrieving data related to webpages that are updated frequently, such as social media webpages, for example.
    • 本文描述了用于从网页检索数据的方法和系统。 调度器根据某些预定标准来组织或者相当地命令一组网页标识符。 基于该排序,提取器可以被配置为从由标识符标识的网页获取数据。 为了提高效率并减少网页更新时和提取器从网页检索数据之间的延迟,调度器可以被配置为以这样的方式重新排序标识符,使得它导致不相关的标识符,并且不会 被发送到提取者,变得更加相关。 以这种方式,该方法和系统可能特别适用于检索与频繁更新的网页相关的数据,例如社交媒体网页。
    • 10. 发明申请
    • EXTRACTION OF CERTAIN TYPES OF ENTITIES
    • 提取某些类型的实体
    • US20110131244A1
    • 2011-06-02
    • US12626905
    • 2009-11-29
    • Amir J. PadovitzMatthew F. Hurst
    • Amir J. PadovitzMatthew F. Hurst
    • G06F17/30G06F15/18
    • G06F16/367G06F16/355
    • Certain types of entities may be extracted from a document. In one example, the entities to be recognized are cultural entities, such as the names of movies, video games, books, etc. For each such entity, a concept graph may be built that shows the relationship between the entity itself and other entities, such as the relationship between a movie and the actor(s) who act in the movie. When a candidate entity name is detected in the document, the concept graph may be used to look for other entities that appear in the context of the candidate entity. The presence of related entities in the context of the candidate may be used to disambiguate the meaning of the candidate. For example, a common word like “up” might be recognized as the name of a movie if the names of actors or characters in that movie appear near the word “up”.
    • 可以从文档中提取某些类型的实体。 在一个示例中,要被识别的实体是文化实体,诸如电影,视频游戏,书籍等的名称。对于每个这样的实体,可以构建示出实体本身和其他实体之间的关系的概念图, 例如电影和在电影中扮演的演员之间的关系。 当在文档中检测到候选实体名称时,概念图可以用于查找出现在候选实体的上下文中的其他实体。 在候选人的上下文中存在相关实体可以用来消除候选人的意思。 例如,如果该电影中的演员或角色的名字出现在“up”字样附近,则可能将诸如“up”的常用单词识别为电影的名称。