专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20080104056A1 Distributional similarity-based models for query correction 有权
标题翻译：基于分布相似性的查询校正模型
公开(公告)号：US20080104056A1
公开(公告)日：2008-05-01
申请号：US11589557
申请日：2006-10-30
申请人： Mu Li , Ming Zhou
发明人： Mu Li , Ming Zhou
IPC分类号： G06F17/30
CPC分类号： G06F17/3069 , G06F17/30672 , G06F17/30687 , G06Q10/063 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要： A distributional similarity between a word of a search query and a term of a candidate word sequences is used to determine an error model probability that describes the probability of the search query given the candidate word sequence. The error model probability is used to determine a probability of the candidate word sequence given the search query. The probability of the candidate word sequence given the search query is used to select a candidate word sequence as a corrected word sequence for the search query. Distributional similarity is also used to build features that are applied in maximum entropy model to compute the probability of the candidate word sequence given the search query.
摘要翻译：使用搜索查询的词和候选词序列的词之间的分布相似度来确定描述候选词序列的搜索查询的概率的误差模型概率。误差模型概率用于确定给定搜索查询的候选词序列的概率。使用给出搜索查询的候选词序列的概率用于选择候选词序列作为搜索查询的校正单词序列。分布相似性也用于构建在最大熵模型中应用的特征，以计算给定搜索查询的候选词序列的概率。

2. 发明申请

US20060287848A1 Language classification with random feature clustering 审中-公开
标题翻译：语言分类与随机特征聚类
公开(公告)号：US20060287848A1
公开(公告)日：2006-12-21
申请号：US11157091
申请日：2005-06-20
申请人： Mu Li , Jianfeng Gao , Ming Zhou
发明人： Mu Li , Jianfeng Gao , Ming Zhou
IPC分类号： G06F17/27
CPC分类号： G06F16/355
摘要： An ensemble of random feature clusters is built from training data using a clustering algorithm where some randomness has been introduced. For each clustered feature space, a classifier, such as a Naïve Bayesian Classifier, is trained, realizing a classifier ensemble. The final classification decision is made by the resulting classifier ensemble.
摘要翻译：随机特征群集由训练数据构建，使用聚类算法，其中引入了一些随机性。对于每个聚类特征空间，训练一个分类器，如朴素贝叶斯分类器，实现分类器集合。最终的分类决定是由所得到的分类器集合决定的。

3. 发明授权

US07493251B2 Using source-channel models for word segmentation 有权
标题翻译：使用源通道模型进行分词
公开(公告)号：US07493251B2
公开(公告)日：2009-02-17
申请号：US10448644
申请日：2003-05-30
申请人： Jianfeng Gao , Mu Li , Chang-Ning Huang , Jian Sun , Lei Zhang , Ming Zhou
发明人： Jianfeng Gao , Mu Li , Chang-Ning Huang , Jian Sun , Lei Zhang , Ming Zhou
IPC分类号： G06F17/27 , G10L11/00
CPC分类号： G06F17/2755 , G06F17/277
摘要： A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
摘要翻译：提供了用于分割文本的方法和装置，其从字符序列识别实体类型的序列，从而识别字符序列的分割。在本发明下，使用描述实体序列的可能性的概率模型和给定特定实体的字符序列的可能性来识别实体类型的序列。在本发明的一个方面，从识别的实体的第一序列识别组织名称实体，以形成所识别实体的最终序列。

4. 发明申请

US20080046405A1 QUERY SPELLER 有权
公开(公告)号：US20080046405A1
公开(公告)日：2008-02-21
申请号：US11465023
申请日：2006-08-16
申请人： Elliott K. Olds , Gregory N. Hullender , Haoyong Zhang , Janine R. Crumb , Jianfeng Gao , Ming Zhou , Mu Li , Yajuan Lv
发明人： Elliott K. Olds , Gregory N. Hullender , Haoyong Zhang , Janine R. Crumb , Jianfeng Gao , Ming Zhou , Mu Li , Yajuan Lv
IPC分类号： G06F17/30
CPC分类号： G06F17/3064
摘要： Candidate suggestions for correcting misspelled query terms input into a search application are automatically generated. A score for each candidate suggestion can be generated using a first decoding pass and paths through the suggestions can be ranked in a second decoding pass. Candidate suggestions can be generated based on typographical errors, phonetic mistakes and/or compounding mistakes. Furthermore, a ranking model can be developed to rank candidate suggestions to be presented to a user.
摘要翻译：自动生成用于纠正输入到搜索应用程序中的拼错查询条件的候选建议。可以使用第一解码通道来生成每个候选建议的得分，并且通过建议的路径可以被排列在第二解码通行证中。可以根据印刷错误，语音错误和/或复合错误生成候选建议。此外，可以开发排名模型来排列要呈现给用户的候选建议。

5. 发明申请

US20060282255A1 Collocation translation from monolingual and available bilingual corpora 审中-公开
标题翻译：单语和双语语料库的翻译
公开(公告)号：US20060282255A1
公开(公告)日：2006-12-14
申请号：US11152540
申请日：2005-06-14
申请人： Yajuan Lu , Jianfeng Gao , Ming Zhou , John Chen , Mu Li
发明人： Yajuan Lu , Jianfeng Gao , Ming Zhou , John Chen , Mu Li
IPC分类号： G06F17/28
CPC分类号： G06F17/2827
摘要： A system and method of extracting collocation translations is presented. The methods include constructing a collocation translation model using monolingual source and target language corpora as well as bilingual corpus, if available. The collocation translation model employs an expectation maximization algorithm with respect to contextual words surrounding collocations. The collocation translation model can be used later to extract a collocation translation dictionary. Optional filters based on context redundancy and/or bi-directional translation constrain can be used to ensure that only highly reliable collocation translations are included in the dictionary. The constructed collocation translation model and the extracted collocation translation dictionary can be used later for further natural language processing, such as sentence translation.
摘要翻译：提出了一种提取搭配翻译的系统和方法。这些方法包括使用单语源语言和目标语言语料库以及双语语料库（如果可用）来构建搭配翻译模型。搭配翻译模型采用围绕搭配的上下文单词的期望最大化算法。搭配翻译模型可以随后用于提取搭配翻译字典。可以使用基于上下文冗余和/或双向转换约束的可选过滤器来确保字典中仅包含高度可靠的并置转换。构建的搭配翻译模型和提取的搭配翻译词典可以稍后用于进一步的自然语言处理，如句子翻译。

6. 发明申请

US20120022850A1 Statistical machine translation processing 审中-公开
标题翻译：统计机器翻译处理
公开(公告)号：US20120022850A1
公开(公告)日：2012-01-26
申请号：US13250417
申请日：2011-09-30
申请人： Chi-Ho Li , Mu Li , Dongdong Zhang , Ming Zhou
发明人： Chi-Ho Li , Mu Li , Dongdong Zhang , Ming Zhou
IPC分类号： G06F17/28
CPC分类号： G06F17/2818
摘要： A method of statistical machine translation (SMT) is provided. The method comprises generating reordering knowledge based on the syntax of a source language (SL) and a number of alignment matrices that map sample SL sentences with sample target language (TL) sentences. The method further comprises receiving a SL word string and parsing the SL word string into a parse tree that represents the syntactic properties of the SL word string. The nodes on the parse tree are reordered based on the generated reordering knowledge in order to provide reordered word strings. The method further comprises translating a number of reordered word strings to create a number of TL word strings, and identifying a statistically preferred TL word string as a preferred translation of the SL word string.
摘要翻译：提供了统计机器翻译（SMT）的方法。该方法包括基于源语言（SL）的语法和将样本SL语句与样本目标语言（TL）语句对齐的多个对齐矩阵来生成重排序知识。该方法还包括接收SL字串并将SL字串解析成表示SL字串的句法属性的解析树。基于所生成的重新排序知识来重新排序解析树上的节点，以提供重新排序的字串。该方法还包括翻译多个重新排序的字串以创建多个TL字串，并且将统计上优选的TL字串识别为SL字串的优选翻译。

7. 发明授权

US07818332B2 Query speller 有权
标题翻译：查询拼写器
公开(公告)号：US07818332B2
公开(公告)日：2010-10-19
申请号：US11465023
申请日：2006-08-16
申请人： Elliott K. Olds , Gregory N. Hullender , Haoyong Zhang , Janine R. Crumb , Jianfeng Gao , Ming Zhou , Mu Li , Yajuan Lv
发明人： Elliott K. Olds , Gregory N. Hullender , Haoyong Zhang , Janine R. Crumb , Jianfeng Gao , Ming Zhou , Mu Li , Yajuan Lv
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/3064
摘要： Candidate suggestions for correcting misspelled query terms input into a search application are automatically generated. A score for each candidate suggestion can be generated using a first decoding pass and paths through the suggestions can be ranked in a second decoding pass. Candidate suggestions can be generated based on typographical errors, phonetic mistakes and/or compounding mistakes. Furthermore, a ranking model can be developed to rank candidate suggestions to be presented to a user.
摘要翻译：自动生成用于纠正输入到搜索应用程序中的拼错查询条件的候选建议。可以使用第一解码通道来生成每个候选建议的得分，并且通过建议的路径可以被排列在第二解码通行证中。可以根据印刷错误，语音错误和/或复合错误生成候选建议。此外，可以开发排名模型来排列要呈现给用户的候选建议。

8. 发明授权

US07590626B2 Distributional similarity-based models for query correction 有权
标题翻译：基于分布相似性的查询校正模型
公开(公告)号：US07590626B2
公开(公告)日：2009-09-15
申请号：US11589557
申请日：2006-10-30
申请人： Mu Li , Ming Zhou
发明人： Mu Li , Ming Zhou
IPC分类号： G06F17/30
CPC分类号： G06F17/3069 , G06F17/30672 , G06F17/30687 , G06Q10/063 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要： A distributional similarity between a word of a search query and a term of a candidate word sequences is used to determine an error model probability that describes the probability of the search query given the candidate word sequence. The error model probability is used to determine a probability of the candidate word sequence given the search query. The probability of the candidate word sequence given the search query is used to select a candidate word sequence as a corrected word sequence for the search query. Distributional similarity is also used to build features that are applied in maximum entropy model to compute the probability of the candidate word sequence given the search query.
摘要翻译：使用搜索查询的词和候选词序列的词之间的分布相似度来确定描述候选词序列的搜索查询的概率的误差模型概率。误差模型概率用于确定给定搜索查询的候选词序列的概率。使用给出搜索查询的候选词序列的概率用于选择候选词序列作为搜索查询的校正单词序列。分布相似性也用于构建在最大熵模型中应用的特征，以计算给定搜索查询的候选词序列的概率。

9. 发明申请

US20090106015A1 Statistical machine translation processing 有权
标题翻译：统计机器翻译处理
公开(公告)号：US20090106015A1
公开(公告)日：2009-04-23
申请号：US11977133
申请日：2007-10-23
申请人： Chi-Ho Li , Mu Li , Dongdong Zhang , Ming Zhou
发明人： Chi-Ho Li , Mu Li , Dongdong Zhang , Ming Zhou
IPC分类号： G06F17/28
CPC分类号： G06F17/2818
摘要： A method of statistical machine translation (SMT) is provided. The method comprises generating reordering knowledge based on the syntax of a source language (SL) and a number of alignment matrices that map sample SL sentences with sample target language (TL) sentences. The method further comprises receiving a SL word string and parsing the SL word string into a parse tree that represents the syntactic properties of the SL word string. The nodes on the parse tree are reordered based on the generated reordering knowledge in order to provide reordered word strings. The method further comprises translating a number of reordered word strings to create a number of TL word strings, and identifying a statistically preferred TL word string as a preferred translation of the SL word string.
摘要翻译：提供了统计机器翻译（SMT）的方法。该方法包括基于源语言（SL）的语法和将样本SL语句与样本目标语言（TL）语句对齐的多个对齐矩阵来生成重新排序知识。该方法还包括接收SL字串并将SL字串解析成表示SL字串的句法属性的解析树。基于所生成的重新排序知识来重新排序解析树上的节点，以提供重新排序的字串。该方法还包括翻译多个重新排序的字串以创建多个TL字串，并且将统计上优选的TL字串识别为SL字串的优选翻译。

10. 发明授权

US07092567B2 Post-processing system and method for correcting machine recognized text 失效
标题翻译：用于校正机器识别文本的后处理系统和方法
公开(公告)号：US07092567B2
公开(公告)日：2006-08-15
申请号：US10288645
申请日：2002-11-04
申请人： Yue Ma , Jinhong Katherine Guo , Mu Li , Yu-kun Tong , Tian-shun Yao , Jing-bo Zhu
发明人： Yue Ma , Jinhong Katherine Guo , Mu Li , Yu-kun Tong , Tian-shun Yao , Jing-bo Zhu
IPC分类号： G06K9/34 , G06K9/72 , G06K9/03 , G06F17/27
CPC分类号： G06K9/723 , G06K2209/01
摘要： A method of post-processing character data from an optical character recognition (OCR) engine and apparatus to perform the method. This exemplary method includes segmenting the character data into a set of initial words. The set of initial words is word level processed to determine at least one candidate word corresponding to each initial word. The set of initial words is segmented into a set of sentences. Each sentence in the set of sentences includes a plurality of initial words and candidate words corresponding to the initial words. A sentence is selected from the set of sentences. The selected sentence is word disambiguity processed to determine a plurality of final words. A final word is selected from the at least one candidate word corresponding to a matching initial word. The plurality of final words is then assembled as post-processed OCR data.
摘要翻译：一种后处理来自光学字符识别（OCR）引擎和装置的字符数据的方法。该示例性方法包括将字符数据分割成一组初始字。初始字的集合被处理为字处理以确定与每个初始字对应的至少一个候选字。该组初始单词被分割成一组句子。该组句子中的每个句子包括与初始词对应的多个初始词和候选词。从一组句子中选出一个句子。所选择的句子是处理的词消除歧义以确定多个最终词。从对应于匹配的初始字的至少一个候选字中选择最终字。然后将多个最终单词组装为后处理OCR数据。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式