会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明授权
    • System and method for categorizing objects in combined categories
    • 用于对组合类别中的对象进行分类的系统和方法
    • US5943670A
    • 1999-08-24
    • US976246
    • 1997-11-21
    • John Martin Prager
    • John Martin Prager
    • G06F17/30
    • G06F17/30705Y10S707/99935Y10S707/99936
    • The present invention is a system and method for determining whether the best category for an object under investigation is a mixture of preexisting categories, and how the mixture is constituted. This invention is useful both for suggesting the need for new categories, and for a fixed set of categories, determining whether a document should be assigned to multiple categories. The objects of the categorization system are typically, but need not be, documents. Categorization may be by subject-matter, language or other criteria. The invention causes extra information to be stored in a category index, so that the determination of mixed categories using the methods presented here is performed extremely efficiently.
    • 本发明是用于确定被调查对象的最佳类别是否是预先存在的类别的混合以及如何构成混合物的系统和方法。 本发明对于建议对新类别的需要以及用于确定文档是否应被分配到多个类别的固定类别来说是有用的。 分类系统的对象通常是但不一定是文档。 分类可能是通过主题,语言或其他标准。 本发明使额外的信息存储在类别索引中,使得使用这里呈现的方法的混合类别的确定被非常有效地执行。
    • 7. 发明授权
    • Identifying duplicate documents from search results without comparing
document content
    • 从搜索结果中识别重复的文档,而不比较文档内容
    • US5913208A
    • 1999-06-15
    • US677059
    • 1996-07-09
    • Eric William BrownJohn Martin Prager
    • Eric William BrownJohn Martin Prager
    • G06F17/30
    • G06F17/30864G06F17/30696Y10S707/959Y10S707/99933Y10S707/99935
    • A computer system has a document collection of one or more documents and one or more indexes that each include an inverted file with one or more terms. Each of the terms is associated with one or more document identifiers. The index further includes a document catalog that associates each of the document identifiers with one or more attributes, either intrinsic or non intrinsic. A search engine process produces a hit list having one or more hit list entries. Each hit list entry, with one or more hit list attributes, is associated with one of the documents that is determined by the search engine to be relevant to the query. A formatter processor selects one or more of the hit list attributes, identified by a hit list attribute selector and then compares the selected attributes of two or more entries on the hit list to determine whether or not documents associated with these entries are duplicate instances of one another. The determination can be made without examining the content of the document associated with the entries.
    • 计算机系统具有一个或多个文档的文档集合和一个或多个索引,每个索引包括具有一个或多个术语的反转文件。 每个术语都与一个或多个文档标识符相关联。 索引还包括将每个文档标识符与一个或多个属性(内在的或非固有的)相关联的文档目录。 搜索引擎过程产生具有一个或多个命中列表条目的命中列表。 具有一个或多个命中列表属性的每个命中列表条目与由搜索引擎确定为与查询相关的文档之一相关联。 格式器处理器选择由命中列表属性选择器识别的命中列表属性中的一个或多个,然后比较命中列表上的两个或多个条目的所选属性,以确定与这些条目相关联的文档是否是重复的一个实例 另一个。 可以在不检查与条目相关联的文档的内容的情况下进行确定。
    • 8. 发明授权
    • System, method and program product for answering questions using a search engine
    • 使用搜索引擎回答问题的系统,方法和程序产品
    • US06665666B1
    • 2003-12-16
    • US09495645
    • 2000-02-01
    • Eric William BrownAnni R. CodenJohn Martin PragerDragomir Radkov Radev
    • Eric William BrownAnni R. CodenJohn Martin PragerDragomir Radkov Radev
    • G06F1730
    • G06F17/30672Y10S707/99935
    • The present invention is a system, method, and program product that comprises a computer with a collection of documents to be searched. The documents contain free form (natural language) text. We define a set of labels called QA-Tokens, which function as abstractions of phrases or question-types. We define a pattern file, which consists of a number of pattern records, each of which has a question template, an associated question word pattern, and an associated set of QA-Tokens. We describe a query-analysis process which receives a query as input and matches it to one or more of the question templates, where a priority algorithm determines which match is used if there is more than one. The query-analysis process then replaces the associated question word pattern in the matching query with the associated set of QA-Tokens, and possibly some other words. This results in a processed query having some combination of original query tokens, new tokens from the pattern file, and QA-Tokens, possibly with weights. We describe a pattern-matching process that identifies patterns of text in the document collection and augments the location with corresponding QA-Tokens. We define a text index data structure which is an inverted list of the locations of all of the words in the document collection, together with the locations of all of the augmented QA-Tokens. A search process then matches the processed query against a window of a user-selected number of sentences that is slid across the document texts. A hit-list of top-scoring windows is returned to the user.
    • 本发明是一种系统,方法和程序产品,其包括具有要搜索的文档的集合的计算机。 文件包含自由形式(自然语言)文本。 我们定义了一组称为QA-Tokens的标签,它们作为短语或问题类型的抽象。 我们定义一个模式文件,它由多个模式记录组成,每个模式记录都有一个问题模板,一个关联的问题单词模式和一组关联的质量检查标记。 我们描述一个查询分析过程,它接收一个查询作为输入并将其与一个或多个问题模板相匹配,其中优先级算法确定如果存在多个问题模板,则使用哪个匹配。 然后,查询分析过程将匹配查询中的相关问题词模式与相关的QA令牌集合以及可能的其他一些单词替换。 这导致处理的查询具有原始查询令牌,来自模式文件的新令牌和可能具有权重的QA令牌的某些组合。 我们描述了一种模式匹配过程,用于识别文档集合中的文本模式,并使用相应的QA-Token来增加位置。 我们定义一个文本索引数据结构,它是文档集合中所有单词的位置的反向列表,以及所有增强的质量检查令牌的位置。 然后,搜索过程将处理的查询与用户选择的句子数目的窗口匹配,该窗口在文档文本上滑动。 顶级评分窗口的命中列表将返回给用户。
    • 9. 发明授权
    • System and method for determining confidence levels for the results of a
categorization system
    • 用于确定分类系统结果的置信水平的系统和方法
    • US6003027A
    • 1999-12-14
    • US976349
    • 1997-11-21
    • John Martin Prager
    • John Martin Prager
    • G06F17/30
    • G06F17/30705Y10S707/99935Y10S707/99936
    • After a categorization process has been run, the scores of the top-two ranking categories along with the size or number of features in the object being categorized, are passed to a confidence assignment process. This determines a value for the confidence in the top category based on the evidence afforded by the input parameters. The magnitude of this confidence value will determine whether the system can accept the automatic categorization results, or whether human involvement is required. This invention also describes the process of determining the optimal value of an internal scaling parameter in the confidence assignment process. The construction of a threshold table based on this parameter is also described. The threshold table matches confidence values against error levels. For a given error rate the previously assigned confidence determines whether the categorization results can be accepted without need for human intervention. This invention maximizes the number of objects that can be automatically processed, for a given error rate.
    • 在分类过程已经运行之后,前两名排名类别的分数以及被分类对象中的特征的大小或数量被传递到置信度分配过程。 这基于输入参数提供的证据来确定顶级类别的置信度值。 该置信度值的大小将决定系统是否可以接受自动分类结果,还是需要人为参与。 本发明还描述了在置信度分配过程中确定内部缩放参数的最佳值的过程。 还描述了基于该参数的阈值表的构造。 阈值表匹配置信度值与错误级别。 对于给定的错误率,先前分配的置信度确定分类结果是否可以被接受而不需要人为干预。 对于给定的错误率,本发明使可以自动处理的对象的数量最大化。