会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Automatic labeling of unlabeled text data
    • 自动标记未标记的文本数据
    • US06697998B1
    • 2004-02-24
    • US09591497
    • 2000-06-12
    • Frederick J. DamerauDavid E. JohnsonMartin C. Buskirk, Jr.
    • Frederick J. DamerauDavid E. JohnsonMartin C. Buskirk, Jr.
    • G06F1721
    • G06F17/3071
    • A method of automatically labeling of unlabeled text data can be practiced independent of human intervention, but that does not preclude manual intervention. The method can be used to extract relevant features of unlabeled text data for a keyword search. The method of automated labeling of unlabeled text data uses a document collection as a reference answer set. Members of the answer set are converted to vectors representing centroids of unknown groups of unlabeled text data. Unlabeled text data are clustered relative to the centroids by a nearest neighbor algorithm and the ID of the relevant answer is assigned to all documents in the cluster. At this point in the process, a supervised machine learning algorithm is trained on labeled data, and a classifier for assigning labels to new text data is output. Alternatively, a feature extraction algorithm may be run on classes generated by the step of clustering, and search features output which index the unlabeled text data.
    • 自动标记未标记的文本数据的方法可以独立于人为干预来实施,但这并不排除手动干预。 该方法可用于提取关键字搜索的未标记文本数据的相关特征。 自动标记未标记的文本数据的方法使用文档集合作为参考答案集。 答案集的成员被转换为表示未标识的未标记文本数据的质心的向量。 未标记的文本数据通过最近邻算法相对于质心进行聚类,并将相关答案的ID分配给集群中的所有文档。 在此过程中,对标记数据进行监督机器学习算法的训练,并输出用于为新文本数据分配标签的分类器。 或者,特征提取算法可以在通过聚类步骤生成的类上运行,以及搜索特征输出,其索引未标记的文本数据。
    • 3. 发明授权
    • Categorization based text processing
    • 基于分类的文本处理
    • US06618715B1
    • 2003-09-09
    • US09589398
    • 2000-06-08
    • David E. JohnsonFrederick J. Damerau
    • David E. JohnsonFrederick J. Damerau
    • G06F1700
    • G06F17/30705G06F17/27
    • A rules based configurable system efficiently and effectively determines for a given electronically represented text document which linguistic analysis and extraction processes and which application specific processes should be invoked to provide more accurate answers to a user's query. In a rules based classifier, where each category or topic is represented by a set of rules, in an application such as routing, the categorization effecting the routing can be effectively combined with processes extracting other information. This may be in the form of a prompt for the user to input additional information.
    • 基于规则的可配置系统有效和有效地确定了给定的电子表示的文本文档,语言分析和提取过程以及哪些应用程序特定进程应被调用以向用户的查询提供更准确的答案。 在基于规则的分类器中,每个类别或主题由一组规则表示,在诸如路由的应用中,影响路由的分类可以与提取其他信息的进程有效地组合。 这可以是用户输入附加信息的提示的形式。
    • 4. 发明授权
    • Automated set up of web-based natural language interface
    • 自动设置基于Web的自然语言界面
    • US07177796B1
    • 2007-02-13
    • US09605709
    • 2000-06-27
    • Frederick J. DamerauDavid E. Johnson
    • Frederick J. DamerauDavid E. Johnson
    • G06F17/27
    • G06F17/277G06F17/3089
    • A procedure automates the process of setting up an instance of a conversational natural language interface for a Web site. By automating the process of setting up a new Web site, the process enables a new interface to be created by anyone. Subsequent manual tuning of the interface is possible and much easier to do than creating an interface from scratch. In order to set up an instance of a natural language conversational interface, it is necessary to define a hierarchy of topics into which individual documents or Web pages can be classified, provide a keyword index for those documents for an associated search engine, and for each node in the hierarchy, specify a mechanism for associating an input natural language (NL) query to the node.
    • 一个过程自动化了一个Web站点的会话式自然语言界面实例的设置过程。 通过自动化设置新网站的过程,该过程使得任何人都可以创建一个新的界面。 随后手动调整界面是可能的,比从头创建界面更容易做到。 为了建立自然语言会话界面的实例,有必要定义一个主题的层次结构,单个文档或网页可以分类到这些主题层次中,为相关联的搜索引擎提供这些文档的关键字索引,并为每个 节点,指定用于将输入自然语言(NL)查询与节点相关联的机制。
    • 5. 发明授权
    • Rule induction for summarizing documents in a classified document collection
    • 用于归类分类文件收集中的文件的规则归纳
    • US07162413B1
    • 2007-01-09
    • US09349494
    • 1999-07-09
    • David E. JohnsonFrederick J. Damerau
    • David E. JohnsonFrederick J. Damerau
    • G06F17/27G06F12/21
    • G06F17/30719
    • A method and apparatus for providing summaries of documents belonging to a class of documents in a classified document collection. A sample set of documents belonging to one or more classes is processed via a machine learning system in order to induce a set of rules associated with the sample set of documents. The vocabulary in the rules are extracted and compared to words, terms or phrases of an incoming document. Any matches between the extracted rules and the words, terms or phrases of the incoming document are used as a summary for the incoming document. By using the method and apparatus, each document does not have to be processed to find most important words and the like in order to provide a summary for that document and then repeating the same process for additional documents.
    • 一种用于在分类文档集合中提供属于一类文档的文档的摘要的方法和装置。 通过机器学习系统处理属于一个或多个类的文档的样本集合,以便引发与文档样本集相关联的一组规则。 提取规则中的词汇表并将其与传入文档的单词,术语或短语进行比较。 提取的规则与传入文档的单词,术语或短语之间的任何匹配都将用作传入文档的摘要。 通过使用该方法和装置,不需要处理每个文档以找到最重要的单词等,以提供该文档的摘要,然后对其他文档重复相同的过程。
    • 7. 发明授权
    • Guess-ahead feature for a keyboard-display terminal data input system
    • 键盘显示终端数据输入系统的预测功能
    • US4330845A
    • 1982-05-18
    • US108774
    • 1979-12-31
    • Frederick J. Damerau
    • Frederick J. Damerau
    • G06F3/153G06F3/02G06F17/21G06F17/24G06F17/27G06F3/14G06F7/34
    • G06F17/24G06F17/276
    • A guess-ahead feature for an interactive terminal having a keyboard and a display screen where input data is entered via the keyboard and displayed. Means are provided for continually evaluating input data to determine if it is the beginning of a string of data stored in the system memory. If the input data is determined to match the beginning of the string of prestored data, the complete string of stored data is displayed without moving the cursor. A function key is provided so that if the displayed complete string is the string the terminal operator desires to enter, the terminal operator can, by the depressing the function key, advance the cursor to the end of the string. If, however, the displayed string is not exactly as desired, the operator merely continues keying input data.
    • 具有键盘和显示屏的交互式终端的预测功能,其中通过键盘输入输入数据并显示。 提供了用于连续评估输入数据以确定其是否是存储在系统存储器中的数据串的开始的手段。 如果确定输入数据与预先存储的数据的字符串的开头相匹配,则不会移动光标就可以显示完整的存储数据串。 提供功能键,如果显示的完整字符串是终端操作员希望进入的字符串,则终端操作员可以通过按下功能键将光标推到字符串的末尾。 然而,如果显示的字符串不完全符合要求,则操作者仅继续键入输入数据。
    • 8. 发明授权
    • Storing and retrieving records in a computer system
    • 在计算机系统中存储和检索记录
    • US5390359A
    • 1995-02-14
    • US854170
    • 1992-03-20
    • Frederick J. Damerau
    • Frederick J. Damerau
    • G06F12/00G06F17/30G06F7/00
    • G06F17/30949Y10S707/99933
    • A method and apparatus for determining whether a record, or an edited version thereof, is stored in a computer system. With this invention, whenever a record is stored in the system a hash function is applied to subsets of a key representing the record to be stored to generate multiple hash addresses. A copy of the key, or pointer thereto, is stored at each of the generated hash addresses. Whenever one wishes to determine whether a key is stored in the system, a hash function is applied to subsets of the test record to generate multiple hash addresses. The key for the test record then compared with the key stored in each of the generated hash addresses. If the key for the test record is sufficiently close to anyone of the keys found at the hash addresses, the test record is assumed to be stored in the system.
    • 一种用于确定记录或其编辑的版本是否存储在计算机系统中的方法和装置。 利用本发明,每当记录被存储在系统中时,散列函数被应用于表示要存储的记录的密钥的子集以生成多个散列地址。 密钥的一个副本或其指针被存储在每个生成的散列地址处。 每当希望确定密钥是否存储在系统中时,散列函数被应用于测试记录的子集以生成多个散列地址。 然后将测试记录的密钥与存储在每个生成的散列地址中的密钥进行比较。 如果测试记录的密钥与散列地址中找到的任何密钥足够接近,则假设测试记录被存储在系统中。
    • 10. 发明授权
    • Light weight document matcher
    • 轻量级文件匹配器
    • US06286000B1
    • 2001-09-04
    • US09203673
    • 1998-12-01
    • Chidanand ApteFrederick J. DamerauSholom M. WeissBrian F. White
    • Chidanand ApteFrederick J. DamerauSholom M. WeissBrian F. White
    • G06F1730
    • G06F17/30622Y10S707/99934Y10S707/99935
    • A lightweight document matcher employs minimal processing and storage. The lightweight document matcher matches new documents to those stored in a database. The matcher lists, in order, those stored documents that are most similar to the new document. The new documents are typically problem statements or queries, and the stored documents are potential solutions such as FAQs (Frequently Asked Questions). Given a set of documents, titles, and possibly keywords, an automatic back-end process constructs a global dictionary of unique keywords and local dictionaries of relevant words for each document. The application front-end uses this information to score the relevance of stored documents to new documents. The scoring algorithm uses the count of matched words as a base score, and then assigns bonuses to words that have high predictive value. It optionally assigns an extra bonus for a match of words in special sections, e.g., titles. The method uses minimal data structures and lightweight scoring algorithms to compute efficiently even in restricted environments, such as mobile or small desktop computers.
    • 轻量级的文档匹配器使用最少的处理和存储。 轻量级文档匹配器将新文档与存储在数据库中的文档进行匹配。 匹配器按顺序列出与新文档最相似的存储文档。 新文档通常是问题陈述或查询,存储的文档是常见问题解答(常见问题)等潜在解决方案。 给定一组文档,标题和可能的关键字,自动后端过程构建了每个文档的唯一关键字和相关词的本地字典的全局字典。 应用程序前端使用此信息来计算存储文档与新文档的相关性。 评分算法使用匹配词的计数作为基准分数,然后将奖金分配给具有高预测值的单词。 它可以选择为特殊部分(例如标题)中的词匹配额外的奖励。 该方法使用最少的数据结构和轻量级评分算法即使在受限的环境(如移动或小型台式计算机)也能高效计算。