专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

81. 发明申请

US20140343939A1 Discriminative Training of Document Transcription System 有权
标题翻译：文件转录系统的歧视性培训
公开(公告)号：US20140343939A1
公开(公告)日：2014-11-20
申请号：US14244053
申请日：2014-04-03
申请人： MModal IP LLC
发明人： Lambert Mathias , Girija Yegnanarayanan , Juergen Fritsch
IPC分类号： G10L15/06 , G10L15/26
CPC分类号： G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633
摘要： A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要翻译：提供用于训练用于语音识别的声学模型的系统。特别地，这样的系统可以用于基于口语音频流和口头音频流的非文字转录来执行训练。这样的系统可以识别表示具有多个口头形式的概念的非文字记录中的文本。该系统可以尝试在音频流中识别在非文字转录中产生相应文本的音频流中的实际语音形式，从而产生更准确地表示语音音频流的经修改的脚本。可以使用修改和更准确的抄本来使用辨别性训练技术训练声学模型，从而产生比使用直接基于原始非文字誊本进行训练的常规技术产生的更好的声学模型。

82. 发明申请

US20130332164A1 NAME RECOGNITION SYSTEM 有权
标题翻译：名称识别系统
公开(公告)号：US20130332164A1
公开(公告)日：2013-12-12
申请号：US13492720
申请日：2012-06-08
申请人： Devang K. Nalk
发明人： Devang K. Nalk
IPC分类号： G10L15/06
CPC分类号： G10L15/187 , G10L15/30 , G10L2015/025 , G10L2015/0633
摘要： A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.
摘要翻译：在一个实施例中，语音识别系统使用通过在用户的一组数据库（例如用户的联系人数据库）中处理单词与一组发音猜测器来获得的扩展语音字典。语音识别系统可以使用传统的语音字典和扩展语音字典来识别作为用户请求使用联系人数据库的语音输入，例如进行电话呼叫等。扩展的语音字典可以响应于联系人数据库中的变化和发音猜测器的集合可以包括多个语言环境的发音猜测器，每个语言环境具有其自己的发音猜测器。

83. 发明申请

US20130166297A1 Discriminative Training of Document Transcription System 有权
标题翻译：文件转录系统的歧视性培训
公开(公告)号：US20130166297A1
公开(公告)日：2013-06-27
申请号：US13773928
申请日：2013-02-22
申请人： MULTIMODAL TECHNOLOGIES, LLC
发明人： Lambert Mathias , Girija Yegnanarayanan , Juergen Fritsch
IPC分类号： G10L15/06
CPC分类号： G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633
摘要： A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要翻译：提供用于训练用于语音识别的声学模型的系统。特别地，这样的系统可以用于基于口语音频流和口头音频流的非文字转录来执行训练。这样的系统可以识别表示具有多个口头形式的概念的非文字记录中的文本。该系统可以尝试在音频流中识别在非文字转录中产生相应文本的音频流中的实际语音形式，从而产生更准确地表示语音音频流的经修改的脚本。可以使用修改和更准确的抄本来使用辨别性训练技术训练声学模型，从而产生比使用直接基于原始非文字誊本进行训练的常规技术产生的更好的声学模型。

84. 发明申请

US20130163887A1 OBJECT CLASSIFICATION/RECOGNITION APPARATUS AND METHOD 有权
标题翻译：对象分类/识别装置和方法
公开(公告)号：US20130163887A1
公开(公告)日：2013-06-27
申请号：US13724220
申请日：2012-12-21
申请人： HONDA MOTOR CO., LTD. , NATIONAL UNIVERSITY CORPORATION KOBE UNIVERSITY
发明人： Mikio NAKANO , Naoto IWAHASHI , Yasuo ARIKI , Yuko OZASA , Takahiro HORI , Ryohei NAKATANI
IPC分类号： G06K9/62
CPC分类号： G06K9/6267 , G06K9/6254 , G06K9/6277 , G06K9/6293 , G10L15/01 , G10L2015/025 , G10L2015/0633
摘要： An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
摘要翻译：提供了一种用于将目标分类为已知对象组和未知对象组的装置。该装置包括：语音/图像数据存储单元，被配置为存储对象的名称和对象的图像的口语声音; 参考已知对象的名称的口语声音，被配置为针对对象的名称计算语音的语音置信水平的单元; 被配置为计算相对于已知对象的图像的对象的图像的图像置信水平的单元; 以及被配置为将通过组合语音置信度和图像置信水平而获得的评估值与阈值进行比较的单元，并且将目标对象分类为根据姓名的语音确定的对象组和图像是已知或未知的。

85. 发明授权

US08412521B2 Discriminative training of document transcription system 有权
公开(公告)号：US08412521B2
公开(公告)日：2013-04-02
申请号：US11228607
申请日：2005-09-16
申请人： Lambert Mathias , Girija Yegnanarayanan , Juergen Fritsch
发明人： Lambert Mathias , Girija Yegnanarayanan , Juergen Fritsch
IPC分类号： G10L15/26 , G10L15/18
CPC分类号： G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633
摘要： A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.

86. 发明申请

US20120232885A1 SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS 有权
标题翻译：用于建立多元语言模型的系统和方法
公开(公告)号：US20120232885A1
公开(公告)日：2012-09-13
申请号：US13042890
申请日：2011-03-08
申请人： Luciano De Andrade BARBOSA , Srinivas BANGALORE
发明人： Luciano De Andrade BARBOSA , Srinivas BANGALORE
IPC分类号： G06F17/27 , G10L15/00
CPC分类号： G06F17/28 , G06F17/21 , G06F17/27 , G06F17/2705 , G06F17/2715 , G06F17/2735 , G06F17/2765 , G10L2015/0633
摘要： Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
摘要翻译：本文公开了用于收集网络数据以便创建不同语言模型的系统，方法和非暂时的计算机可读存储介质。被配置为实践该方法的系统首先通过根据访问策略的互连设备的网络中的诸如通过在计算设备上操作的爬行器来爬行一组文档，其中所述访问策略被配置为专注于新颖区域目前的语言模型是从以前的爬行周期构建的，通过抓取其词汇被认为可能填补当前语言模型的空白的文档。来自上一个循环的语言模型可用于指导在以下循环中创建语言模型。新奇区域可以包括与当前语言模型相比具有高困惑价值的文档。

87. 发明申请

US20060241943A1 Medical vocabulary templates in speech recognition 审中-公开
标题翻译：语音识别中的医学词汇模板
公开(公告)号：US20060241943A1
公开(公告)日：2006-10-26
申请号：US11477121
申请日：2006-06-29
申请人： Anuthep Benja-Athon , Sirikit Benja-Athon
发明人： Anuthep Benja-Athon , Sirikit Benja-Athon
IPC分类号： G10L15/26
CPC分类号： G06Q10/10 , G06Q50/22 , G10L15/06 , G10L15/285 , G10L25/48 , G10L2015/0633 , G10L2015/223
摘要： A system of templates of words and terms use in medicine and surgery by physicians for optimizing the outcomes of speech recognition process of converting digital voice data produced by physicians into digital text data comprises words and terms use in medical and surgical specialties. The medical and surgical vocabulary templates comprise individual logical arrangements or orders of related words and terms use by physicians to communicate and record health and health-care information and data.
摘要翻译：词汇和术语的模板系统由医师用于医学和外科手术，用于优化将由医生产生的数字语音数据转换为数字文本数据的语音识别过程的结果，包括在医学和外科专业中使用的词语和术语。医学和外科术语词汇模板包括个人的逻辑安排或相关词汇的命令以及医生用来沟通和记录健康和保健信息和数据的术语。

88. 发明授权

US07120582B1 Expanding an effective vocabulary of a speech recognition system 有权
标题翻译：扩展语音识别系统的有效词汇
公开(公告)号：US07120582B1
公开(公告)日：2006-10-10
申请号：US09390370
申请日：1999-09-07
申请人： Jonathan H. Young , Haakon L. Chevalier , Laurence S. Gillick , Toffee A. Albina , Marlboro B. Moore, III , Paul E. Rensing , Jonathan P. Yamron
发明人： Jonathan H. Young , Haakon L. Chevalier , Laurence S. Gillick , Toffee A. Albina , Marlboro B. Moore, III , Paul E. Rensing , Jonathan P. Yamron
IPC分类号： G10L15/00 , G10L15/06
CPC分类号： G10L15/063 , G10L2015/0633 , G10L2015/0635 , G10L2015/0636
摘要： The invention provides techniques for creating and using fragmented word models to increase the effective size of an active vocabulary of a speech recognition system. The active vocabulary represents all words and word fragments that the speech recognition system is able to recognize. Each word may be represented by a combination of acoustic models. As such, the active vocabulary represents the combinations of acoustic models that the speech recognition system may compare to a user's speech to identify acoustic models that best match the user's speech. The effective size of the active vocabulary may be increased by dividing words into constituent components or fragments (for example, prefixes, suffixes, separators, infixes, and roots) and including each component as a separate entry in the active vocabulary. Thus, for example, a list of words and their plural forms (for example, “book, books, cook, cooks, hook, hooks, look and looks”) may be represented in the active vocabulary using the words (for example, “book, cook, hook and look”) and an entry representing the suffix that makes the words plural (for example, “+s”, where the “+” preceding the “s” indicates that “+s” is a suffix). For a large list of words, and ignoring the entry associated with the suffix, this technique may reduce the number of vocabulary entries needed to represent the list of words considerably.
摘要翻译：本发明提供了用于创建和使用分割词模型以增加语音识别系统的活跃词汇表的有效大小的技术。活动词汇表示语音识别系统能够识别的所有单词和单词片段。每个单词可以由声学模型的组合来表示。因此，活动词汇表示声学模型的组合，语音识别系统可以与用户的语音进行比较，以识别与用户的语音最匹配的声学模型。活动词汇表的有效大小可以通过将单词划分成组成组件或片段（例如，前缀，后缀，分隔符，中缀和根）并将每个组件作为活动词汇表中的单独条目来增加。因此，例如，可以在活动词汇表中使用单词（例如，“书籍，书籍，烹饪，烹饪，钩子，钩子，外观和外观”）的单词列表及其复数形式书签，烹饪，钩子和外观“）和表示使单词复数的后缀的条目（例如，”+ s“，其中”+“之前的”+“表示”+ s“是后缀）。对于大量单词列表，忽略与后缀相关联的条目，这种技术可能会大大减少用于表示单词列表所需的词汇表数量。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式