会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 8. 发明授权
    • Method of identifying the language of a textual passage using short word and/or n-gram comparisons
    • 使用短单词和/或n-gram比较来识别文本段落的语言的方法
    • US07359851B2
    • 2008-04-15
    • US10757313
    • 2004-01-14
    • Xiang TongGregory T. GrefenstetteDavid A. Evans
    • Xiang TongGregory T. GrefenstetteDavid A. Evans
    • G06F17/20G06F17/27
    • G06F17/218G06F17/2247G06F17/275
    • A method and system identifying the language of a textual passage is disclosed. The method and system includes parsing the textual passage into n-grams and assigning an initial weight to each n-gram, and adjusting the weight initially assigned to a word or n-gram parsed from the textual passage. The initially assigned weight is adjusted in a manner proportionate to the inverse of the number of languages within which such words or n-grams appear. Reducing the weight assigned to such words or n-grams diminishes—without completely eliminating—their importance in comparison to other words or n-grams parsed from the same textual passage when determining the language of a passage. The method and system of the present invention appropriately weighs the short words or n-grams common to multiple languages without affecting the short words or n-grams that are uncommon to several languages.
    • 公开了一种识别文本段落语言的方法和系统。 该方法和系统包括将文本段解析成n-gram并为每个n-gram分配初始权重,并且调整最初分配给从文本段解析出的单词或n-gram的权重。 以与这些单词或n-gram出现的语言数量的倒数成比例的方式调整初始分配的权重。 与确定一个段落的语言的同一文本段落中解释的其他单词或n-gram相比,减少分配给这些单词或n-gram的重量减少 - 没有完全消除它们的重要性。 本发明的方法和系统适当地称重多种语言通用的短语或n-gram,而不影响几种语言不常见的短语或n-gram。
    • 9. 发明授权
    • System for automatically generating queries
    • 用于自动生成查询的系统
    • US06778979B2
    • 2004-08-17
    • US09683235
    • 2001-12-05
    • Gregory T. GrefenstetteJames G. Shanahan
    • Gregory T. GrefenstetteJames G. Shanahan
    • G06F1730
    • G06F17/30643G06F17/30722Y10S707/99932Y10S707/99933Y10S707/99942Y10S707/99943
    • A system generates a query using an entity extractor, a categorizer, a query generator, and a short run aspect vector. The entity extractor identifies a set of entities in selected document content for searching information related thereto using an information retrieval system. The categorizer defines an organized classification of document content with each class in the organization of content having associated therewith a classification label that corresponds to a category of information in the information retrieval system. The categorizer assigns the selected document content a classification label from the organized classification of content. A query generator formulates a query that restricts a search at the information retrieval system to the category of information in the information retrieval system identified by the assigned classification label. The short length aspect vector generator generates terms for further refining the query using context information surrounding the set of entities in the selected document content.
    • 系统使用实体提取器,分类程序,查询生成器和短期方面向量生成查询。 实体提取器使用信息检索系统来识别所选择的文档内容中的一组实体来搜索与之相关的信息。 分类器定义文档内容的有组织分类,其中内容组织中的每个类别具有与信息检索系统中的信息类别对应的分类标签。 分类器从有组织的内容分类中分配所选择的文档内容分类标签。 查询生成器制定将信息检索系统的搜索限制为由所分配标签识别的信息检索系统中的信息类别的查询。 短长度方向矢量生成器生成用于使用围绕所选择的文档内容中的实体集合的上下文信息来进一步细化查询的术语。