专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06173258B2 Method for reducing noise distortions in a speech recognition system 失效
标题翻译：降低语音识别系统噪声失真的方法
公开(公告)号：US06173258B2
公开(公告)日：2001-01-09
申请号：US09177461
申请日：1998-10-22
申请人： Xavier Menendez-Pidal , Miyuki Tanaka , Ruxin Chen , Duanpei Wu
发明人： Xavier Menendez-Pidal , Miyuki Tanaka , Ruxin Chen , Duanpei Wu
IPC分类号： G10L506
CPC分类号： G10L21/0208 , G10L15/02 , G10L21/0264
摘要： A method for reducing noise distortions in a speech recognition system comprises a feature extractor that includes a noise-suppressor, one or more time cosine transforms, and a normalizer. The noise-suppressor preferably performs a spectral subtraction process early in the feature extraction procedure. The time cosine transforms preferably operate in a centered-mode to each perform a transformation in the time domain. The normalizer calculates and utilizes normalization values to generate normalized features for speech recognition. The calculated normalization values preferably include mean values, left variances and right variances.
摘要翻译：一种用于减少语音识别系统中的噪声失真的方法包括：特征提取器，其包括噪声抑制器，一个或多个时间余弦变换和归一化器。噪声抑制器优选地在特征提取过程的早期执行频谱减法处理。时间余弦变换优选地以居中模式操作，以在时域中执行变换。归一化器计算并利用归一化值来生成用于语音识别的归一化特征。计算的归一化值优选包括平均值，左方差和右方差。

2. 发明授权

US06778959B1 System and method for speech verification using out-of-vocabulary models 失效
标题翻译：使用词汇外模型进行语音验证的系统和方法
公开(公告)号：US06778959B1
公开(公告)日：2004-08-17
申请号：US09691877
申请日：2000-10-18
申请人： Duanpei Wu , Lex Olorenshaw , Xavier Menendez-Pidal , Ruxin Chen
发明人： Duanpei Wu , Lex Olorenshaw , Xavier Menendez-Pidal , Ruxin Chen
IPC分类号： G10L1314
CPC分类号： G10L15/08
摘要： A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.
摘要翻译：使用词汇外模型的语音验证的系统和方法包括语音识别器，其具有具有系统词汇词模型的模型库，垃圾模型和一个或多个噪声模型。当模型库识别与垃圾模型或噪声模型相对应的话语或其他声音时，模型库可以拒绝话语或其他声音作为无效词汇单词。初始噪声模型可以选择性地组合到预定数量的最终噪声模型群集中，以有效地减少由语音识别器的模型库利用以验证系统词汇单词的噪声模型的数量。

3. 发明授权

US06473735B1 System and method for speech verification using a confidence measure 失效
标题翻译：使用置信度测量语音验证的系统和方法
公开(公告)号：US06473735B1
公开(公告)日：2002-10-29
申请号：US09553985
申请日：2000-04-20
申请人： Duanpei Wu , Xavier Menendez-Pidal , Lex Olorenshaw , Ruxin Chen
发明人： Duanpei Wu , Xavier Menendez-Pidal , Lex Olorenshaw , Ruxin Chen
IPC分类号： G10L1506
CPC分类号： G10L15/10 , G10L2015/085
摘要： The present invention comprises a system and method for speech verification using a confidence measure that includes a speech verifier which compares a differential score for a recognized word to a predetermined threshold value, where a recognized word is the word model that produced the highest recognition score. In one embodiment, a single threshold is used for each word in a vocabulary. In another embodiment, each word model has an associated threshold, so that a differential score for a recognized word is compared to a unique threshold associated with that word. In a further embodiment, pairs of confused words in the vocabulary are dealt with separately. If a confused word is the recognized word, the speech verifier compares the differential score to a threshold that depends on the word model that produced the next-highest recognition score. Different values for the various thresholds may maximize rejection accuracy or recognition accuracy. A trade-off between rejection accuracy and recognition accuracy may be made by utilizing an intermediate threshold value that is between a minimum threshold value and a maximum threshold value.
摘要翻译：本发明包括一种用于使用置信度测量的语音验证的系统和方法，所述置信度测量包括将识别的词的差分得分与预定阈值进行比较的语音验证器，其中识别词是产生最高识别分数的单词模型。在一个实施例中，词汇中的每个单词使用单个阈值。在另一个实施例中，每个单词模型具有相关联的阈值，使得将识别的单词的差分分数与与该单词相关联的唯一阈值进行比较。在另一实施例中，词汇表中的混淆词对被单独处理。如果一个混淆的单词是被识别的单词，语音验证器将差分分数与取决于产生下一最高识别分数的单词模型的阈值进行比较。各种阈值的不同值可以最大化拒绝准确度或识别精度。可以通过利用处于最小阈值和最大阈值之间的中间阈值来进行拒绝准确度和识别精度之间的折衷。

4. 发明授权

US06216103B1 Method for implementing a speech recognition system to determine speech endpoints during conditions with background noise 失效
标题翻译：用于在具有背景噪声的条件下实现语音识别系统以确定语音端点的方法
公开(公告)号：US06216103B1
公开(公告)日：2001-04-10
申请号：US08957875
申请日：1997-10-20
申请人： Duanpei Wu , Miyuki Tanaka , Ruxin Chen , Lex Olorenshaw
发明人： Duanpei Wu , Miyuki Tanaka , Ruxin Chen , Lex Olorenshaw
IPC分类号： G01L300
CPC分类号： G10L25/87 , G10L15/20
摘要： A method for implementing a speech recognition system for use during conditions with background noise includes the steps of calculating, in real-time, sequential short-term delta energy parameters for speech energy from a spoken utterance, determining threshold values in the speech energy, and identifying a beginning point and an ending point for the spoken utterance based on the relationship between the threshold values and the short-term delta energy parameters.
摘要翻译：用于实现在具有背景噪声的条件期间使用的语音识别系统的方法包括以下步骤：实时地从语音话语中计算语音能量的连续短期增量能量参数，确定语音能量中的阈值，以及基于阈值和短期δ能量参数之间的关系来识别口语发音的起始点和终点。

5. 发明授权

US06718302B1 Method for utilizing validity constraints in a speech endpoint detector 失效
标题翻译：用于在语音端点检测器中使用有效性约束的方法
公开(公告)号：US06718302B1
公开(公告)日：2004-04-06
申请号：US09482396
申请日：2000-01-12
申请人： Duanpei Wu , Miyuki Tanaka , Ruxin Chen , Lex Olorenshaw
发明人： Duanpei Wu , Miyuki Tanaka , Ruxin Chen , Lex Olorenshaw
IPC分类号： G10L1102
CPC分类号： G10L25/87
摘要： A method for utilizing validity constraints in a speech endpoint detector comprises a validity manager that may utilize a pulse width module to validate utterances that include a plurality of energy pulses during a certain time period. The validity manager also may utilize a minimum power module to ensure that speech energy below a pre-determined level is not classified as a valid utterance. In addition the validity manager may use a duration module to ensure that valid utterances fall within a specified duration. Finally, the validity manager may utilize a short-utterance minimum power module to specifically distinguish an utterance of short duration from background noise based on the energy level of the short utterance.
摘要翻译：一种用于在语音端点检测器中利用有限约束的方法包括有效性管理器，其可以利用脉冲宽度模块来在特定时间段期间验证包括多个能量脉冲的话语。有效性管理器还可以利用最小功率模块来确保低于预定电平的语音能量不被分类为有效的话语。此外，有效性管理器可以使用持续时间模块来确保有效的话语落在指定的持续时间内。最后，有效性管理器可以利用短话语最小功率模块来基于短语的能量级别来特别地区分短时间的短时间与背景噪声的发音。

6. 发明授权

US6006186A Method and apparatus for a parameter sharing speech recognition system 失效
标题翻译：一种参数共享语音识别系统的方法和装置
公开(公告)号：US6006186A
公开(公告)日：1999-12-21
申请号：US953026
申请日：1997-10-16
申请人： Ruxin Chen , Miyuki Tanaka , Duanpei Wu , Lex S. Olorenshaw
发明人： Ruxin Chen , Miyuki Tanaka , Duanpei Wu , Lex S. Olorenshaw
IPC分类号： G10L15/14 , G10L15/18 , G10L7/08
CPC分类号： G10L15/142 , G10L15/148
摘要： A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models having the same center context. The generated phoneme models are trained, and shared phoneme model states are generated that are shared among the phoneme models. Shared probability distribution functions are generated that are shared among the phoneme model states. Shared probability sub-distribution functions are generated that are shared among the phoneme model probability distribution functions. The shared phoneme model hierarchy is reevaluated for further sharing in response to the shared probability sub-distribution functions. Signals representative of the received speech signals are generated.
摘要翻译：提供了一种用于参数共享语音识别系统的方法和装置。语音信号被接收到语音识别系统的处理器中。语音信号使用一个语音识别系统进行处理，该语音识别系统承载通过生成许多音素模型而产生的共享隐马尔可夫模型（HMM），其中一些是共享的。音素模型是通过保留作为单独音素模型的任何具有超过预定阈值的已训练帧数的三音模型而产生的。生成共享音素模型以表示具有共同biphone的经过训练的帧的数量超过预定阈值的三音节音素模型组中的每一组。生成共享音素模型以表示三音节音素模型中的每一组，其中对音素上下文具有等效影响的经过训练的帧的数量超过预先指定的阈值。生成共享音素模型以表示具有相同中心上下文的三音节音素模型组中的每一组。生成的音素模型被训练，并且生成在音素模型中共享的共享音素模型状态。生成在音素模型状态之间共享的共享概率分布函数。生成在音素模型概率分布函数中共享的共享概率子分布函数。共享音素模型层次结构被重新评估以响应于共享概率子分布函数进一步共享。生成表示接收到的语音信号的信号。

7. 发明授权

US08788256B2 Multiple language voice recognition 有权
标题翻译：多语言语音识别
公开(公告)号：US08788256B2
公开(公告)日：2014-07-22
申请号：US12698963
申请日：2010-02-02
申请人： Ruxin Chen , Gustavo Hernandez-Abrego , Masanori Omote , Xavier Menendez-Pidal
发明人： Ruxin Chen , Gustavo Hernandez-Abrego , Masanori Omote , Xavier Menendez-Pidal
IPC分类号： G06F17/20
CPC分类号： G10L15/187 , G10L2015/025
摘要： Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.
摘要翻译：计算机实现的语音处理由作为第二语言的母语的第一语言的非母语者产生第一语言的输入单词的一个或多个发音。输入字被转换为一个或多个发音。每个发音包括从与第二语言相关联的一组音素中选择的一个或多个音素。每个发音都与计算机数据库中的条目中的输入单词相关联。数据库中的每个发音都与识别发音语言和/或音素语言的信息相关联。

8. 发明申请

US20070061142A1 Audio, video, simulation, and user interface paradigms 有权
标题翻译：音频，视频，模拟和用户界面范例
公开(公告)号：US20070061142A1
公开(公告)日：2007-03-15
申请号：US11522304
申请日：2006-09-15
申请人： Gustavo Hernandez-Abrego , Xavier Menendez-Pidal , Steven Osman , Ruxin Chen , Rishi Deshpande , Care Michaud-Wideman , Richard Marks , Eric Larsen , Xiaodong Mao
发明人： Gustavo Hernandez-Abrego , Xavier Menendez-Pidal , Steven Osman , Ruxin Chen , Rishi Deshpande , Care Michaud-Wideman , Richard Marks , Eric Larsen , Xiaodong Mao
IPC分类号： G10L17/00
CPC分类号： A63F13/428 , A63F13/213 , A63F13/217 , G06F3/005 , G06F3/012 , G06F3/015 , G06F2203/011 , G10L17/005 , G10L17/04
摘要： Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
摘要翻译：消费电子设备已经开发出了巨大的信息处理能力，高质量的音频和视频输出，大量的存储器，还可以包括有线和/或无线网络功能。此外，当与具有这些增强能力的设备耦合时，相对不成熟和便宜的传感器（例如麦克风，摄像机，GPS或其他位置传感器）可用于检测关于用户及其环境的微妙特征。已经开发出各种音频，视频，仿真和用户界面范例来利用这些设备的增强功能。这些范例可以单独使用或以任何组合一起使用。一个范例使用说话人识别自动创建用户身份。另一种范例包括一个具有3轴压力敏感度的控制按钮，用于游戏控制器和其他输入设备。

9. 发明授权

US08825482B2 Audio, video, simulation, and user interface paradigms 有权
标题翻译：音频，视频，模拟和用户界面范例
公开(公告)号：US08825482B2
公开(公告)日：2014-09-02
申请号：US11522304
申请日：2006-09-15
申请人： Gustavo Hernandez-Abrego , Xavier Menendez-Pidal , Steven Osman , Ruxin Chen , Rishi Deshpande , Care Michaud-Wideman , Richard Marks , Eric Larsen , Xiaodong Mao
发明人： Gustavo Hernandez-Abrego , Xavier Menendez-Pidal , Steven Osman , Ruxin Chen , Rishi Deshpande , Care Michaud-Wideman , Richard Marks , Eric Larsen , Xiaodong Mao
IPC分类号： G10L17/00
CPC分类号： A63F13/428 , A63F13/213 , A63F13/217 , G06F3/005 , G06F3/012 , G06F3/015 , G06F2203/011 , G10L17/005 , G10L17/04
摘要： Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
摘要翻译：消费电子设备已经开发出了巨大的信息处理能力，高质量的音频和视频输出，大量的存储器，还可以包括有线和/或无线网络功能。此外，当与具有这些增强能力的设备耦合时，相对不成熟和便宜的传感器（例如麦克风，摄像机，GPS或其他位置传感器）可用于检测关于用户及其环境的微妙特征。已经开发出各种音频，视频，仿真和用户界面范例来利用这些设备的增强功能。这些范例可以单独使用或以任何组合一起使用。一个范例使用说话人识别自动创建用户身份。另一种范例包括一个具有3轴压力敏感度的控制按钮，用于游戏控制器和其他输入设备。

10. 发明申请

US20110288869A1 ROBUSTNESS TO ENVIRONMENTAL CHANGES OF A CONTEXT DEPENDENT SPEECH RECOGNIZER 有权
标题翻译：对语境相关语音识别器的环境变化的鲁棒性
公开(公告)号：US20110288869A1
公开(公告)日：2011-11-24
申请号：US12785375
申请日：2010-05-21
申请人： Xavier Menendez-Pidal , Ruxin Chen
发明人： Xavier Menendez-Pidal , Ruxin Chen
IPC分类号： G10L15/14 , G06F15/18 , G06F17/30
CPC分类号： G10L15/144 , G10L15/187 , G10L2015/022 , G10L2015/0631
摘要： An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
摘要翻译：一种用于提高对应用的上下文相关语音识别器对环境变化的鲁棒性的装置，其包括用于存储用于语音识别训练的声音的训练数据库，用于存储由语音识别器支持的单词的词典和用于训练的语音识别器训练模块使用训练数据库和字典的一组或多个多状态隐马尔可夫模型（HMM）。语音识别器训练模块对每个HMM的每个状态执行不均匀的状态聚类处理，其包括对于每个HMM的至少一些状态使用不同的非均匀簇阈值来进行更大的聚类，并相应地减少每个HMM的状态的观察分布的数量较少受一个或多个上下文相关性的经验影响。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式