专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07206741B2 Method of speech recognition using time-dependent interpolation and hidden dynamic value classes 有权
标题翻译：使用时间依赖插值和隐藏动态值类的语音识别方法
公开(公告)号：US07206741B2
公开(公告)日：2007-04-17
申请号：US11294858
申请日：2005-12-06
申请人： Li Deng , Jian-lai Zhou , Frank Torsten Bernd Seide , Asela J. R. Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
发明人： Li Deng , Jian-lai Zhou , Frank Torsten Bernd Seide , Asela J. R. Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
IPC分类号： G10L15/04
CPC分类号： G10L15/12 , G10L2015/025
摘要： A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
摘要翻译：通过基于在先前状态结束时的最佳生产相关值来确定当前状态的生产相关值来解码语音信号，从一组连续值中选择最佳生产相关值。生产相关值用于确定电话由与先前状态和当前状态之间的路径对准的一组观察向量表示的可能性。电话的可能性与来自前述状态的得分组合以确定当前状态的分数，来自前一状态的分数与生产相关值的离散类相关联，其中该类与最佳生产类别匹配相关价值。

2. 发明授权

US07050975B2 Method of speech recognition using time-dependent interpolation and hidden dynamic value classes 有权
标题翻译：使用时间依赖插值和隐藏动态值类的语音识别方法
公开(公告)号：US07050975B2
公开(公告)日：2006-05-23
申请号：US10267522
申请日：2002-10-09
申请人： Li Deng , Jian-Iai Zhou , Frank Torsten Bernd Seide , Asela J. R. Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
发明人： Li Deng , Jian-Iai Zhou , Frank Torsten Bernd Seide , Asela J. R. Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
IPC分类号： G10L15/14
CPC分类号： G10L15/12 , G10L2015/025
摘要： A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
摘要翻译：提供了一种语音识别方法，其通过使用时间相关的内插权重在前一时间通过执行生产相关动态值与生产相关目标之间的线性插值来识别生产相关动态值。隐藏的生产相关动态值用于计算与观测值相比较的预测值，以确定给定隐藏语音单元序列的观测声学的可能性。在一些实施例中，从一组连续值中选择先前时间的生产相关动态值。另外，给出隐藏语音单元序列的观测声学的可能性与前一时刻与离散类别的生产相关动态值相关联的得分组合，以确定当前语音状态的得分。

3. 发明申请

US20060085191A1 Method of speech recognition using time-dependent interpolation and hidden dynamic value classes 有权
标题翻译：使用时间依赖插值和隐藏动态值类的语音识别方法
公开(公告)号：US20060085191A1
公开(公告)日：2006-04-20
申请号：US11294858
申请日：2005-12-06
申请人： Li Deng , Jian-Iai Zhou , Frank Seide , Asela Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
发明人： Li Deng , Jian-Iai Zhou , Frank Seide , Asela Gunawardana , Hagai Attias , Alejandro Acero , Xuedong Huang
IPC分类号： G10L15/14
CPC分类号： G10L15/12 , G10L2015/025
摘要： A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
摘要翻译：通过基于在先前状态结束时的最佳生产相关值来确定当前状态的生产相关值来解码语音信号，从一组连续值中选择最佳生产相关值。生产相关值用于确定电话由与先前状态和当前状态之间的路径对准的一组观察向量表示的可能性。电话的可能性与来自前述状态的得分组合以确定当前状态的分数，来自前一状态的分数与生产相关值的离散类相关联，其中该类与最佳生产类别匹配相关价值。

4. 发明授权

US06990447B2 Method and apparatus for denoising and deverberation using variational inference and strong speech models 有权
公开(公告)号：US06990447B2
公开(公告)日：2006-01-24
申请号：US09999576
申请日：2001-11-15
申请人： Hagai Attias , John Carlton Platt , Li Deng , Alejandro Acero
发明人： Hagai Attias , John Carlton Platt , Li Deng , Alejandro Acero
IPC分类号： G10L15/08 , G10L15/12 , G10L15/06 , G10L21/02
CPC分类号： G10L21/0208 , G10L2021/02082 , H04R2225/43
摘要： A probability distribution for speech model parameters, such as auto-regression parameters, is used to identify a distribution of denoised values from a noisy signal. Under one embodiment, the probability distributions of the speech model parameters and the denoised values are adjusted to improve a variational inference so that the variational inference better approximates the joint probability of the speech model parameters and the denoised values given a noisy signal. In some embodiments, this improvement is performed during an expectation step in an expectation-maximization algorithm. The statistical model can also be used to identify an average spectrum for the clean signal and this average spectrum may be provided to a speech recognizer instead of the estimate of the clean signal.

5. 发明申请

US20050114134A1 Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximations 审中-公开
标题翻译：使用分段线性近似的连续值声道共振跟踪的方法和装置
公开(公告)号：US20050114134A1
公开(公告)日：2005-05-26
申请号：US10723995
申请日：2003-11-26
申请人： Li Deng , Hagai Attias , Alejandro Acero , Leo Lee
发明人： Li Deng , Hagai Attias , Alejandro Acero , Leo Lee
IPC分类号： G10L15/10 , G10L11/00 , G10L15/02 , G10L15/14 , G10L15/28 , G10L19/06
CPC分类号： G10L25/48 , G10L25/15
摘要： A method and apparatus tracks vocal tract resonance components, including both frequencies and bandwidths, in a speech signal. The components are tracked by defining a state equation that is linear with respect to a past vocal tract resonance vector and that predicts a current vocal tract resonance vector. An observation equation is also defined that is linear with respect to a current vocal tract resonance vector and that predicts at least one component of an observation vector. The state equation, the observation equation, and a sequence of observation vectors are used to identify a sequence of vocal tract resonance vectors using Kalman filter algorithm. Under one embodiment, the observation equation is defined based on a piecewise linear approximation to a non-linear function. The parameters of the linear approximation are selected based on pre-defined regions, which are determined from a crude estimate of a vocal tract resonance vector.
摘要翻译：一种方法和装置在语音信号中跟踪声道共振分量，包括频率和频带两者。通过定义相对于过去声道共振矢量线性的状态方程并且预测当前声道共振矢量来跟踪组件。还定义了相对于当前声道共振矢量是线性的并且预测观察矢量的至少一个分量的观察方程。状态方程，观察方程和观察矢量序列用于使用卡尔曼滤波算法识别声道共振矢量序列。在一个实施例中，基于对非线性函数的分段线性近似来定义观察方程。基于由声道共振矢量的粗略估计确定的预定义区域来选择线性近似的参数。

6. 发明申请

US20110251844A1 GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA 有权
标题翻译：使用声学数据的图形到电声转换
公开(公告)号：US20110251844A1
公开(公告)日：2011-10-13
申请号：US13164683
申请日：2011-06-20
申请人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
发明人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
IPC分类号： G10L15/04
CPC分类号： G10L13/08 , G10L15/063 , G10L15/187
摘要： Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
摘要翻译：描述了使用声学数据来改进用于语音识别的字形到音素转换，例如更准确地识别语音拨号系统中的语音名称。描述了声学和图形（声学数据，音素序列，字形序列以及音素序列和图形序列之间的对齐）的联合模型，正如通过使用声学数据适应图形模型参数的最大似然训练和鉴别训练来重新训练。还描述了用于接收的声学数据的无监督的字母标签集合，从而自动获得可用于再培训的大量实际样本。不满足置信阈值的语音输入可以被滤除，以便不被再培训的模型使用。

7. 发明申请

US20090150153A1 GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA 有权
标题翻译：使用声学数据的图形到电声转换
公开(公告)号：US20090150153A1
公开(公告)日：2009-06-11
申请号：US11952267
申请日：2007-12-07
申请人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
发明人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
IPC分类号： G10L15/00
CPC分类号： G10L13/08 , G10L15/063 , G10L15/187
摘要： Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
摘要翻译：描述了使用声学数据来改进用于语音识别的字形到音素转换，例如更准确地识别语音拨号系统中的语音名称。描述了声学和图形（声学数据，音素序列，字形序列以及音素序列和图形序列之间的对齐）的联合模型，正如通过使用声学数据适应图形模型参数的最大似然训练和辨别性训练来重新训练。还描述了用于接收的声学数据的无监督的字母标签集合，从而自动获得可用于再培训的大量实际样本。不满足置信阈值的语音输入可以被滤除，以便不被再培训的模型使用。

8. 发明授权

US07103544B2 Method and apparatus for predicting word error rates from text 有权
公开(公告)号：US07103544B2
公开(公告)日：2006-09-05
申请号：US11146324
申请日：2005-06-06
申请人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela J. R. Gunawardana , Ciprian Chelba
发明人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela J. R. Gunawardana , Ciprian Chelba
IPC分类号： G10L15/06 , G10L15/14
CPC分类号： G10L15/197 , G10L15/183
摘要： A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

9. 发明授权

US07117153B2 Method and apparatus for predicting word error rates from text 有权
标题翻译：用于从文本中预测字错误率的方法和装置
公开(公告)号：US07117153B2
公开(公告)日：2006-10-03
申请号：US10365850
申请日：2003-02-13
申请人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela J. R. Gunawardana , Ciprian Chelba
发明人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela J. R. Gunawardana , Ciprian Chelba
IPC分类号： G10L15/06 , G10L15/14
CPC分类号： G10L15/197 , G10L15/183
摘要： A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
摘要翻译：对语音识别系统进行建模的方法包括对从训练文本产生的语音信号进行解码以产生预测语音单元的序列。训练文本包括与预测语音单元的序列一起使用以形成混淆模型的实际语音单元的序列。在另外的实施例中，混淆模型用于对文本进行解码以识别如果语音识别系统基于文本解码的语音将会预期的错误率。

10. 发明授权

US07991615B2 Grapheme-to-phoneme conversion using acoustic data 有权
标题翻译：使用声学数据的语音对音素转换
公开(公告)号：US07991615B2
公开(公告)日：2011-08-02
申请号：US11952267
申请日：2007-12-07
申请人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
发明人： Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
IPC分类号： G10L15/04
CPC分类号： G10L13/08 , G10L15/063 , G10L15/187
摘要： Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
摘要翻译：描述了使用声学数据来改进用于语音识别的字形到音素转换，例如更准确地识别语音拨号系统中的语音名称。描述了声学和图形（声学数据，音素序列，字形序列以及音素序列和图形序列之间的对齐）的联合模型，正如通过使用声学数据适应图形模型参数的最大似然训练和鉴别训练来重新训练。还描述了用于接收的声学数据的无监督的字母标签集合，从而自动获得可用于再培训的大量实际样本。不满足置信阈值的语音输入可以被滤除，以便不被再培训的模型使用。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式