专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US6016471A Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word 失效
标题翻译：使用决策树生成和评分拼写单词的多个发音的方法和设备
公开(公告)号：US6016471A
公开(公告)日：2000-01-18
申请号：US67764
申请日：1998-04-29
申请人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
发明人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
IPC分类号： G10L13/08 , G10L5/04
CPC分类号： G10L13/08
摘要： The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.
摘要翻译：混合决策树包括关于拼写字序列中的相邻字母的是 - 否问题的网络，并且还包括与拼写单词序列相对应的音素序列中的相邻音素。混合决策树的叶节点提供了哪些语音转录最有可能的信息。使用混合树，为多个可能的发音中的每一个开发分数，并且这些分数可以用于选择最佳发音以及按概率的排序排列发音。系统生成的发音可用于语音合成和语音识别应用以及词典应用。

2. 发明授权

US06233561B1 Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue 有权
标题翻译：使用意义提取和对话的手持设备中面向目标的语音翻译方法
公开(公告)号：US06233561B1
公开(公告)日：2001-05-15
申请号：US09290628
申请日：1999-04-12
申请人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Murat Karaorman , Ken Field , Michael Galler , Yi Zhao
发明人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Murat Karaorman , Ken Field , Michael Galler , Yi Zhao
IPC分类号： G10L1522
CPC分类号： G10L15/1822 , G10L15/1815
摘要： A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.
摘要翻译：提供了一种用于处理来自用户的口头请求的计算机实现的方法和装置。语音识别器将口头请求转换为数字格式。帧数据结构将数字化语音请求的语义分量与预定时隙相关联。这些时隙指示用于实现预定目标的数据。连接到语音识别器和帧数据结构的语音理解模块确定语音请求的语义分量。基于确定的语义分量来填充时隙。连接到语音理解模块的对话管理器可以基于所确定的语义组件来确定未填充的至少一个时隙，并且在优选实施例中可以提供填充时隙的确认。制定计算机生成请求以便用户提供与未填充槽相关的数据。该方法和装置非常适合（但不限于）在手持语音翻译装置中使用。

3. 发明授权

US06230131B1 Method for generating spelling-to-pronunciation decision tree 失效
标题翻译：拼写到发音决策树的方法
公开(公告)号：US06230131B1
公开(公告)日：2001-05-08
申请号：US09069308
申请日：1998-04-29
申请人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
发明人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
IPC分类号： G10L1308
CPC分类号： G10L13/08
摘要： Decision trees are used to store a series of yes-no questions that can be used to convert spelled-word letter sequences into pronunciations. Letter-only trees, having internal nodes populated with questions about letters in the input sequence, generate one or more pronunciations based on probability data stored in the leaf nodes of the tree. The pronunciations may then be improved by processing them using mixed trees which are populated with questions about letters in the sequence and also questions about phonemes associated with those letters. The mixed tree screens out pronunciations that would not occur in natural speech, thereby greatly improving the results of the letter-to-pronunciation transformation.
摘要翻译：决策树用于存储可用于将拼写字母序列转换为发音的一系列“是”的问题。仅有信息树，内部节点填充有关输入序列中的字母的问题，根据存储在树的叶节点中的概率数据生成一个或多个发音。然后可以通过使用填充有序列中的字母的问题的混合树以及与这些字母相关的音素的问题来处理它们来发音。混合树屏蔽了自然语言中不会发生的发音，从而大大提高了字母到发音转换的结果。

4. 发明授权

US06711541B1 Technique for developing discriminative sound units for speech recognition and allophone modeling 有权
标题翻译：用于发展用于语音识别和异音素建模的辨别声音单元的技术
公开(公告)号：US06711541B1
公开(公告)日：2004-03-23
申请号：US09390434
申请日：1999-09-07
申请人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
发明人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
IPC分类号： G10L1504
CPC分类号： G10L15/063 , G10L2015/025
摘要： A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.
摘要翻译：开发了一组模型来表示声音单元，然后将这些模型与不正确的声音单元一起使用以确定哪个产生高似然分数。为不正确声音单位产生高似然分数的模型代表更可能被混淆的那些。所产生的可混淆性数据然后可以用于产生更具歧视性的语音模型以及随后的声学决策树的修剪。可混淆性数据还可用于开发用于搜索期间拒绝的混淆性预测变量，并开发出经过优化以最小化混淆性的连续语音识别模型。

5. 发明授权

US06643620B1 Voice activated controller for recording and retrieving audio/video programs 有权
标题翻译：语音激活控制器，用于记录和检索音频/视频节目
公开(公告)号：US06643620B1
公开(公告)日：2003-11-04
申请号：US09270262
申请日：1999-03-15
申请人： Matteo Contolini , Jean-Claude Junqua , Roland Kuhn
发明人： Matteo Contolini , Jean-Claude Junqua , Roland Kuhn
IPC分类号： G10L1500
CPC分类号： H04N21/440236 , G10L2015/223 , G11B27/002 , G11B27/105 , G11B27/107 , G11B27/11 , G11B27/34 , G11B27/36 , G11B2220/2516 , G11B2220/2545 , G11B2220/2562 , G11B2220/41 , G11B2220/455 , G11B2220/65 , G11B2220/90 , H04N5/781 , H04N5/782 , H04N5/85 , H04N21/42203 , H04N21/4334 , H04N21/47214 , H04N21/84 , H04N21/8405
摘要： The system includes a database of program records representing A/V programs which are available for recording. The system also includes an A/V recording device for receiving a recording command and recording the A/V program. A speech recognizer is provided for receiving the spoken request and translating the spoken request into a text stream having a plurality of words. A natural language processor receives the text stream and processes the words for resolving a semantic content of the spoken request. The natural language processor places the meaning of the words into a task frame having a plurality of key word slots. A dialogue system analyzes the task frame for determining if a sufficient number of key word slots have been filled and prompts the user for additional information for filling empty slots. The dialogue system searches the database of program records using the key words placed within the task frame for selecting the A/V program and generating the recording command for use by the A/V recording device.
摘要翻译：该系统包括表示可用于记录的A / V节目的节目记录的数据库。该系统还包括用于接收记录命令并记录A / V程序的A / V记录装置。语音识别器被提供用于接收口头请求并将口头请求转换成具有多个单词的文本流。自然语言处理器接收文本流并处理用于解析语音请求的语义内容的单词。自然语言处理器将单词的含义置于具有多个关键字时隙的任务帧中。对话系统分析任务框以确定是否已经填充了足够数量的关键字槽，并提示用户填充空槽的附加信息。对话系统使用放置在任务帧内的关键词来搜索节目记录的数据库，用于选择A / V节目并产生由A / V记录装置使用的记录命令。

6. 发明授权

US06983244B2 Method and apparatus for improved speech recognition with supplementary information 有权
标题翻译：用于通过补充信息改进语音识别的方法和装置
公开(公告)号：US06983244B2
公开(公告)日：2006-01-03
申请号：US10652146
申请日：2003-08-29
申请人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Rathinavelu Chengalvarayan
发明人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Rathinavelu Chengalvarayan
IPC分类号： G10L15/22
CPC分类号： H04M1/271 , G10L15/08 , G10L15/10 , G10L15/22
摘要： A method for improving recognition results of a speech recognizer uses supplementary information to confirm recognition results. A user inputs speech to a speech recognizer. The speech recognizer resides on a mobile device or on a server at a remote location. The speech recognizer determines a recognition result based on the input speech. A confidence measure is calculated for the recognition result. If the confidence measure is below a threshold, the user is prompted for supplementary data. The supplementary data is determined dynamically based on ambiguities between the input speech and the recognition result, wherein the supplementary data will distinguish the input speech over potential incorrect results. The supplementary data may be a subset of alphanumeric characters that comprise the input speech, or other data associated with a desired result, such as an area code or location. The user may provide the supplementary data verbally, or manually using a keypad, touchpad, touchscreen, or stylus pen.
摘要翻译：用于改善语音识别器的识别结果的方法使用补充信息来确认识别结果。用户向语音识别器输入语音。语音识别器驻留在移动设备或远程位置的服务器上。语音识别器基于输入语音来确定识别结果。计算识别结果的置信度量。如果置信度量值低于阈值，则会提示用户提供补充数据。基于输入语音和识别结果之间的模糊度来动态地确定补充数据，其中补充数据将通过潜在的不正确结果区分输入语音。补充数据可以是组成输入语音的字母数字字符的子集，或与期望结果相关联的其他数据，例如区域代码或位置。用户可以口头提供补充数据，或者使用键盘，触摸板，触摸屏或触控笔手动提供补充数据。

7. 发明申请

US20050049860A1 Method and apparatus for improved speech recognition with supplementary information 有权
标题翻译：用于通过补充信息改进语音识别的方法和装置
公开(公告)号：US20050049860A1
公开(公告)日：2005-03-03
申请号：US10652146
申请日：2003-08-29
申请人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Rathinavelu Chengalvarayan
发明人： Jean-Claude Junqua , Roland Kuhn , Matteo Contolini , Rathinavelu Chengalvarayan
IPC分类号： G10L15/08 , G10L15/10 , G10L15/22 , H04M1/27 , G10L15/00
CPC分类号： H04M1/271 , G10L15/08 , G10L15/10 , G10L15/22
摘要： A method for improving recognition results of a speech recognizer uses supplementary information to confirm recognition results. A user inputs speech to a speech recognizer. The speech recognizer resides on a mobile device or on a server at a remote location. The speech recognizer determines a recognition result based on the input speech. A confidence measure is calculated for the recognition result. If the confidence measure is below a threshold, the user is prompted for supplementary data. The supplementary data is determined dynamically based on ambiguities between the input speech and the recognition result, wherein the supplementary data will distinguish the input speech over potential incorrect results. The supplementary data may be a subset of alphanumeric characters that comprise the input speech, or other data associated with a desired result, such as an area code or location. The user may provide the supplementary data verbally, or manually using a keypad, touchpad, touchscreen, or stylus pen.
摘要翻译：用于改善语音识别器的识别结果的方法使用补充信息来确认识别结果。用户向语音识别器输入语音。语音识别器驻留在移动设备或远程位置的服务器上。语音识别器基于输入语音来确定识别结果。计算识别结果的置信度量。如果置信度量值低于阈值，则会提示用户提供补充数据。基于输入语音和识别结果之间的模糊度来动态地确定补充数据，其中补充数据将通过潜在的不正确结果区分输入语音。补充数据可以是组成输入语音的字母数字字符的子集，或与期望结果相关联的其他数据，例如区域代码或位置。用户可以口头提供补充数据，或者使用键盘，触摸板，触摸屏或触控笔手动提供补充数据。

8. 发明授权

US06571208B1 Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training 有权
标题翻译：用于具有本征语音训练的中大词汇语音识别的背景相关声学模型
公开(公告)号：US06571208B1
公开(公告)日：2003-05-27
申请号：US09450392
申请日：1999-11-29
申请人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
发明人： Roland Kuhn , Jean-Claude Junqua , Matteo Contolini
IPC分类号： G01L1700
CPC分类号： G10L15/07
摘要： A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation. In another embodiment maximum likelihood estimation techniques are used to develop common decision tree frameworks that may be shared across all speakers when constructing the eigenvoice representation of speaker space.
摘要翻译：在训练期间使用减小的维度本征语音分析技术来开发用于异音素的上下文相关的声学模型。特定语音技术在运行时也用于新演讲者的演讲。该技术可以消除单个扬声器的特性，从而产生更普遍适用和强大的异音模型。在一个实施例中，本征语音技术用于识别每个说话者的质心，然后可以将其“减去”识别方程。在另一个实施例中，最大似然估计技术用于开发在构建扬声器空间的本征声表示时可以在所有扬声器之间共享的共同决策树框架。

9. 发明授权

US06233553B1 Method and system for automatically determining phonetic transcriptions associated with spelled words 失效
标题翻译：用于自动确定与拼写单词相关的语音转录的方法和系统
公开(公告)号：US06233553B1
公开(公告)日：2001-05-15
申请号：US09148912
申请日：1998-09-04
申请人： Matteo Contolini , Jean-Claude Junqua , Roland Kuhn
发明人： Matteo Contolini , Jean-Claude Junqua , Roland Kuhn
IPC分类号： G10L1904
CPC分类号： G10L15/065 , G10L2015/086
摘要： New entries are added to the lexicon by entering them as spelled words. A transcription generator, such as a decision-tree-based phoneme or morpheme transcription generator, converts each spelled word into a set of n-best transcriptions or sequences. Meanwhile, user input or automatically generated speech corresponding to the spelled word is processed by an automatic speech recognizer and the recognizer rescores the transcriptions or sequences produced by the transcription generator. One or more of the highest scored (highest confidence) transcriptions may be added to the lexicon to update it. If desired, the spelled word-pronunciation pairs generated by the system can be used to retrain the transcription generator, making the system adaptive or self-learning.
摘要翻译：通过输入新词条作为拼写单词添加到词典中。转录生成器，例如基于决策树的音素或语素转录发生器，将每个拼写单词转换为一组n个最佳转录或序列。同时，由自动语音识别器处理对应于拼写字的用户输入或自动产生的语音，并且识别器重新分配由转录发生器产生的转录或序列。可以将一个或多个最高得分（最高置信度）转录添加到词典中进行更新。如果需要，系统产生的拼写字 - 发音对可用于重新训练转录发生器，使系统自适应或自学习。

10. 发明授权

US06895376B2 Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification 有权
标题翻译：用于语音识别，扬声器识别和说话人验证的声学模型的本征语重新估计技术
公开(公告)号：US06895376B2
公开(公告)日：2005-05-17
申请号：US09849174
申请日：2001-05-04
申请人： Florent Perronnin , Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
发明人： Florent Perronnin , Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/06 , G10L17/00
CPC分类号： G10L15/07 , G10L17/02
摘要： A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.
摘要翻译：在训练期间使用减小的维度本征语音分析技术来开发用于异音素的上下文相关的声学模型。执行重新估计过程以更强烈地分离语音模型的与扬声器相关的和与扬声器无关的组件。特定语音技术在运行时也用于新演讲者的演讲。该技术可以消除单个扬声器的特性，从而产生更普遍适用和强大的异音模型。在一个实施例中，本征语音技术用于识别每个说话者的质心，然后可以将其“减去”识别方程。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式