专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2006035402A1 AUTOMATIC TEXT CORRECTION 审中-公开
标题翻译：自动文本校正
公开(公告)号：WO2006035402A1
公开(公告)日：2006-04-06
申请号：PCT/IB2005/053193
申请日：2005-09-28
申请人： KONINKLIJKE PHILIPS ELECTRONICS N.V. , PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH , PETERS, Jochen , MATUSOV, Evgeny
发明人： PETERS, Jochen , MATUSOV, Evgeny
IPC分类号： G06F17/22 , G10L15/26
CPC分类号： G06F17/273 , G06F17/2282 , G10L15/26
摘要： The present invention provides a method of generating text transformation rules for speech to text transcription systems. The text transformation rules are generated by means of comparing an erroneous text generated by a speech to text transcription system with a correct reference text. Comparison of erroneous and reference text allows to derive a set of text transformation rules that are evaluated by means of a strict application to the training text and successive comparison with the reference text. Evaluation of text transformation rules provides a sufficient approach to determine which of the automatically generated text transformation rules provide an enhancement or degradation of the erroneous text. In this way only those text transformation rules of the set of text transformation rules are selected that guarantee an enhancement of the erroneous text. In this way systematic errors of an automatic speech recognition or natural language process system can be effectively compensated.
摘要翻译：本发明提供了一种生成用于语音到文本转录系统的文本转换规则的方法。通过将语音产生的错误文本与文本转录系统与正确的参考文本进行比较来产生文本转换规则。错误和参考文本的比较允许导出一组文本转换规则，通过对训练文本的严格应用和与参考文本的连续比较来评估。文本转换规则的评估提供了一种足够的方法来确定哪些自动生成的文本转换规则提供错误文本的增强或降级。以这种方式，仅选择文本转换规则集合中的那些文本转换规则，以保证错误文本的增强。以这种方式，可以有效地补偿自动语音识别或自然语言处理系统的系统误差。

2. 发明申请

WO2005050472A2 TEXT SEGMENTATION AND TOPIC ANNOTATION FOR DOCUMENT STRUCTURING 审中-公开
标题翻译：用于文件结构的文本分段和主题注释
公开(公告)号：WO2005050472A2
公开(公告)日：2005-06-02
申请号：PCT/IB2004/052404
申请日：2004-11-12
申请人： PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH , KONINKLIJKE PHILIPS ELECTRONICS N. V. , PETERS, Jochen , MEYER, Carsten , KLAKOW, Dietrich , MATUSOV, Evgeny
发明人： PETERS, Jochen , MEYER, Carsten , KLAKOW, Dietrich , MATUSOV, Evgeny
IPC分类号： G06F17/20
CPC分类号： G06F17/27 , G06F17/2765
摘要： The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.
摘要翻译：本发明涉及一种通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法，计算机程序产品和计算机系统。将文本分割的文本的每个部分进一步分配给与一组标签相关联的主题。用于文本分段和用于将主题及其关联标签分配给文本部分的统计模型明确地表示：文本部分与主题之间的相关性，部分之间的主题转换，内容中的主题位置文件和（主题相关）部分长度。因此，利用训练数据的结构信息来执行未知文本的分割和注释。

3. 发明申请

WO2005050621A2 TOPIC SPECIFIC MODELS FOR TEXT FORMATTING AND SPEECH RECOGNITION 审中-公开
标题翻译：用于文本格式和语音识别的主题特定模型
公开(公告)号：WO2005050621A2
公开(公告)日：2005-06-02
申请号：PCT/IB2004/052403
申请日：2004-11-12
申请人： PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH , KONINKLIJKE PHILIPS ELECTRONICS N. V. , PETERS, Jochen , MATUSOV, Evgeny , MEYER, Carsten , KLAKOW, Dietrich
发明人： PETERS, Jochen , MATUSOV, Evgeny , MEYER, Carsten , KLAKOW, Dietrich
IPC分类号： G10L15/22
CPC分类号： G10L15/183 , G06F17/211 , G06F17/2715 , G10L15/32
摘要： The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding. Based on the assignment of models to sections of text an improved speech recognition and/or text formatting procedure is performed.
摘要翻译：本发明涉及一种通过利用专题统计模型进行语音识别和/或文本格式化的方法，计算机系统和计算机程序产品。可以从第一语音识别通过获得的文本文档被分割并分配给每个获得的部分的主题特定模型的分配。模型集合中的每个模型提供关于语言模型概率，关于文本处理或格式化规则的统计信息，例如。用于标点符号，格式化，文本突出显示的命令的解释或需要特定格式化的不明确的文本部分以及对于识别的文本的每个部分特有的特定词汇表的解释。此外，可以在统计模型中编码语音识别和/或格式化系统的其他属性（例如用于说话率的设置）。模型本身是根据注释的训练数据和/或手动编码生成的。基于将模型分配给文本部分，执行改进的语音识别和/或文本格式化过程。

4. 发明申请

WO2005050474A2 TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS 审中-公开
标题翻译：用主题特定语言模型和主题特定标签统计的用户交互的文本分段和标签分配
公开(公告)号：WO2005050474A2
公开(公告)日：2005-06-02
申请号：PCT/IB2004/052405
申请日：2004-11-12
申请人： PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH , KONINKLIJKE PHILIPS ELECTRONICS N. V. , PETERS, Jochen , MATUSOV, Evgeny , MEYER, Carsten , KLAKOW, Dietrich
发明人： PETERS, Jochen , MATUSOV, Evgeny , MEYER, Carsten , KLAKOW, Dietrich
IPC分类号： G06F17/21
CPC分类号： G06F17/21 , G06F17/27 , G06F17/2765
摘要： The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of the method to the user's preferences and for further training of the statistical models.
摘要翻译：本发明涉及一种方法，计算机程序产品，分割系统和用户界面，用于通过利用在注释训练数据上训练的统计模型来构造非结构化文本。该方法执行文本分段到文本部分，并将标签分配给文本部分作为标题。执行的分割和分配被提供给用户进行一般审查。此外，替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。响应于用户引入的修改，启动了多个不同的动作，其中包括文档或整个文档的连续部分的重新分割和重新标记。此外，该方法包括学习功能，记录和分析用户引入的修改以将该方法适应于用户的偏好和进一步训练统计模型。

5. 发明公开

EP1797506A1 AUTOMATIC TEXT CORRECTION 审中-公开
标题翻译：自动文本更正
公开(公告)号：EP1797506A1
公开(公告)日：2007-06-20
申请号：EP05786831.7
申请日：2005-09-28
申请人： Koninklijke Philips Electronics N.V. , Philips Intellectual Property & Standards GmbH
发明人： PETERS, Jochen , MATUSOV, Evgeny
IPC分类号： G06F17/22 , G10L15/26
CPC分类号： G06F17/273 , G06F17/2282 , G10L15/26
摘要： The present invention provides a method of generating text transformation rules for speech to text transcription systems. The text transformation rules are generated by means of comparing an erroneous text generated by a speech to text transcription system with a correct reference text. Comparison of erroneous and reference text allows to derive a set of text transformation rules that are evaluated by means of a strict application to the training text and successive comparison with the reference text. Evaluation of text transformation rules provides a sufficient approach to determine which of the automatically generated text transformation rules provide an enhancement or degradation of the erroneous text. In this way only those text transformation rules of the set of text transformation rules are selected that guarantee an enhancement of the erroneous text. In this way systematic errors of an automatic speech recognition or natural language process system can be effectively compensated.

6. 发明公开

EP1687807A2 TOPIC SPECIFIC MODELS FOR TEXT FORMATTING AND SPEECH RECOGNITION 有权转让
标题翻译：议题的具体型号为文本格式和语音识别
公开(公告)号：EP1687807A2
公开(公告)日：2006-08-09
申请号：EP04799133.6
申请日：2004-11-12
申请人： Philips Intellectual Property & Standards GmbH , Koninklijke Philips Electronics N.V.
发明人： PETERS, Jochen , MATUSOV, Evgeny , MEYER, Carsten , KLAKOW, Dietrich
IPC分类号： G10L15/22 , G10L15/18 , G06F17/27 , G06F17/21
CPC分类号： G10L15/183 , G06F17/211 , G06F17/2715 , G10L15/32
摘要： The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding. Based on the assignment of models to sections of text an improved speech recognition and/or text formatting procedure is performed.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式