专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07643989B2 Method and apparatus for vocal tract resonance tracking using nonlinear predictor and target-guided temporal restraint 有权
标题翻译：使用非线性预测器和目标引导时间约束的声道共振跟踪的方法和装置
公开(公告)号：US07643989B2
公开(公告)日：2010-01-05
申请号：US10652976
申请日：2003-08-29
申请人： Li Deng , Alejandro Acero , Issam Bazzi
发明人： Li Deng , Alejandro Acero , Issam Bazzi
IPC分类号： G10L19/06
CPC分类号： G10L25/48 , G10L25/15
摘要： A method and apparatus map a set of vocal tract resonant frequencies, together with their corresponding bandwidths, into a simulated acoustic feature vector in the form of LPC cepstrum by calculating a separate function for each individual vocal tract resonant frequency/bandwidth and summing the result to form an element of the simulated feature vector. The simulated feature vector is applied to a model along with an input feature vector to determine a probability that the set of vocal tract resonant frequencies is present in a speech signal. Under one embodiment, the model includes a target-guided transition model that provides a probability of a vocal tract resonant frequency based on a past vocal tract resonant frequency and a target for the vocal tract resonant frequency. Under another embodiment, the phone segmentation is provided by an HMM system and is used to precisely determine which target value to use at each frame.
摘要翻译：一种方法和装置将一组声道共振频率及其相应带宽与LPC倒谱谱形式映射成模拟的声学特征向量，通过计算每个单独的声道共振频率/带宽的单独函数，并将结果相加到形成模拟特征向量的元素。将模拟特征向量与输入特征向量一起应用于模型，以确定声道谐振频率的集合存在于语音信号中的概率。在一个实施例中，该模型包括目标引导的转换模型，其基于过去的声道共振频率和用于声道共振频率的目标提供声道共振频率的概率。在另一个实施例中，电话分割由HMM系统提供，并且用于精确地确定在每个帧处使用哪个目标值。

2. 发明申请

US20050049866A1 Method and apparatus for vocal tract resonance tracking using nonlinear predictor and target-guided temporal constraint 有权
标题翻译：使用非线性预测器和目标引导时间约束的声道共振跟踪的方法和装置
公开(公告)号：US20050049866A1
公开(公告)日：2005-03-03
申请号：US10652976
申请日：2003-08-29
申请人： Li Deng , Alejandro Acero , Issam Bazzi
发明人： Li Deng , Alejandro Acero , Issam Bazzi
IPC分类号： G10L15/02 , G10L11/00 , G10L15/14 , G10L15/08
CPC分类号： G10L25/48 , G10L25/15
摘要： A method and apparatus map a set of vocal tract resonant frequencies, together with their corresponding bandwidths, into a simulated acoustic feature vector in the form of LPC cepstrum by calculating a separate function for each individual vocal tract resonant frequency/bandwidth and summing the result to form an element of the simulated feature vector. The simulated feature vector is applied to a model along with an input feature vector to determine a probability that the set of vocal tract resonant frequencies is present in a speech signal. Under one embodiment, the model includes a target-guided transition model that provides a probability of a vocal tract resonant frequency based on a past vocal tract resonant frequency and a target for the vocal tract resonant frequency. Under another embodiment, the phone segmentation is provided by an HMM system and is used to precisely determine which target value to use at each frame.
摘要翻译：一种方法和装置将一组声道共振频率及其相应带宽与LPC倒谱谱形式映射成模拟的声学特征向量，通过计算每个单独的声道共振频率/带宽的单独函数，并将结果相加到形成模拟特征向量的元素。将模拟特征向量与输入特征向量一起应用于模型，以确定声道谐振频率的集合存在于语音信号中的概率。在一个实施例中，该模型包括目标引导的转换模型，其基于过去的声道共振频率和用于声道共振频率的目标提供声道共振频率的概率。在另一个实施例中，电话分割由HMM系统提供，并且用于精确地确定在每个帧处使用哪个目标值。

3. 发明授权

US07424423B2 Method and apparatus for formant tracking using a residual model 有权
标题翻译：使用残差模型进行共振峰跟踪的方法和装置
公开(公告)号：US07424423B2
公开(公告)日：2008-09-09
申请号：US10404411
申请日：2003-04-01
申请人： Issam Bazzi , Li Deng , Alejandro Acero
发明人： Issam Bazzi , Li Deng , Alejandro Acero
IPC分类号： G10L19/04
CPC分类号： G10L15/02 , G10L25/15
摘要： A method of tracking formants defines a formant search space comprising sets of formants to be searched. Formants are identified for a first frame in the speech utterance by searching the entirety of the formant search space using the codebook, and for the remaining frames by searching the same space using both the codebook and the continuity constraint across adjacent frames. Under one embodiment, the formants are identified by mapping sets of formants into feature vectors and applying the feature vectors to a model. Formants are also identified by applying dynamic programming to search for the best sequence that optimally satisfies the continuity constraint required by the model.
摘要翻译：跟踪共享器的方法定义了包括要搜索的共振峰集合的共振峰搜索空间。通过使用码本搜索整体的共振峰搜索空间，并且通过使用码本和相邻帧之间的连续性约束搜索相同的空间，为语音语音中的第一帧识别共振峰。在一个实施例中，通过将共振峰集合映射到特征向量中并将特征向量应用于模型来识别共振峰。还通过应用动态规划来搜索最优序列，以最佳地满足模型所需的连续性约束，来确定共振峰。

4. 发明申请

US20060047506A1 Greedy algorithm for identifying values for vocal tract resonance vectors 有权
公开(公告)号：US20060047506A1
公开(公告)日：2006-03-02
申请号：US10925585
申请日：2004-08-25
申请人： Li Deng , Alejandro Acero , Issam Bazzi
发明人： Li Deng , Alejandro Acero , Issam Bazzi
IPC分类号： G10L19/06
CPC分类号： G10L25/48 , G10L15/02 , G10L25/15
摘要： A method and apparatus identify values for components of a vocal tract resonance vector by sequentially determining values for each component of the vocal tract resonance vector. To determine a value for a component, the other components are set to static values. A plurality of values for a function are then determined using a plurality of values for the component that is being determined while using the static values for all of the other components. One of the plurality of values for the component is then selected based on the plurality of values for the function.

5. 发明授权

US08214215B2 Phase sensitive model adaptation for noisy speech recognition 有权
标题翻译：嘈杂语音识别的相敏模型适应
公开(公告)号：US08214215B2
公开(公告)日：2012-07-03
申请号：US12236530
申请日：2008-09-24
申请人： Jinyu Li , Li Deng , Dong Yu , Yifan Gong , Alejandro Acero
发明人： Jinyu Li , Li Deng , Dong Yu , Yifan Gong , Alejandro Acero
IPC分类号： G10L15/14
CPC分类号： G10L15/065 , G10L15/20
摘要： A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.
摘要翻译：本文描述的语音识别系统包括接收失真的语音话语的接收机组件。所述语音识别还包括与第一模型和第二模型通信的更新器组件，其中所述更新器组件至少部分地基于由所述第一模型输出的加法和卷积失真的联合估计来自动更新所述第二模型的参数其中，加法和卷积失真的联合估计是基于由接收器部件接收的语音发声中的相敏模型的失真估计。此外，还可以估计用于更新第二模型参数的除加法和卷积失真之外的失真，包括其他静止和非平稳源。

6. 发明授权

US08160878B2 Piecewise-based variable-parameter Hidden Markov Models and the training thereof 有权
标题翻译：基于分段的可变参数隐马尔科夫模型及其训练
公开(公告)号：US08160878B2
公开(公告)日：2012-04-17
申请号：US12211114
申请日：2008-09-16
申请人： Dong Yu , Li Deng , Yifan Gong , Alejandro Acero
发明人： Dong Yu , Li Deng , Yifan Gong , Alejandro Acero
IPC分类号： G10L15/14 , G10L15/20
CPC分类号： G10L15/144
摘要： A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter μ and a variance parameter Σ. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter μ and the variance parameter Σ of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
摘要翻译：语音识别系统使用高斯混合可变参数隐马尔可夫模型（VPHMM）来识别许多不同条件下的语音。 VPHMM的每个高斯混合分量的特征在于平均参数μ和方差参数＆Sgr。这些高斯参数中的每一个作为至少一个环境调节参数的函数而变化，例如但不限于瞬时信噪比（SNR）。高斯参数随环境条件参数变化的方式可以近似为分段函数，如三次样条函数。此外，识别系统制定均值参数μ和方差参数＆Sgr; 每个高斯混合分量以有效的形式适应使用歧视性训练和参数共享。执行参数共享，以便通过实际可行的训练数据量可以有效地减少VPHMM中非常大量的参数。

7. 发明授权

US08145488B2 Parameter clustering and sharing for variable-parameter hidden markov models 有权
标题翻译：可变参数隐马尔可夫模型的参数聚类和共享
公开(公告)号：US08145488B2
公开(公告)日：2012-03-27
申请号：US12211115
申请日：2008-09-16
申请人： Dong Yu , Li Deng , Yifan Gong , Alejandro Acero
发明人： Dong Yu , Li Deng , Yifan Gong , Alejandro Acero
IPC分类号： G10L15/14
CPC分类号： G10L15/142
摘要： A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner.
摘要翻译：语音识别系统使用高斯混合可变参数隐马尔可夫模型（VPHMM）来识别语音。 VPHMM包括作为至少一个环境调节参数的函数而变化的高斯参数。每个高斯参数与环境条件参数的关系使用分段拟合方法建模，例如通过使用样条函数。在训练阶段，识别系统可以使用聚类来识别样条函数的类别，每个类别根据一些距离度量将彼此相似的样条函数分组在一起。识别系统然后可以存储表示各种样条函数的样条参数集合。属于类的样条函数的一个实例可以引用相关联的一组样条参数。高斯参数可以以适合以上述方式共享使用的有效形式来表示。

8. 发明授权

US07734460B2 Time asynchronous decoding for long-span trajectory model 失效
标题翻译：用于长跨度轨迹模型的时间异步解码
公开(公告)号：US07734460B2
公开(公告)日：2010-06-08
申请号：US11311951
申请日：2005-12-20
申请人： Dong Yu , Li Deng , Alejandro Acero
发明人： Dong Yu , Li Deng , Alejandro Acero
IPC分类号： G06F17/21 , G06F17/27 , G10L15/00
CPC分类号： G10L15/08 , G10L15/187
摘要： A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.
摘要翻译：开发了时间异步网格约束搜索算法，用于处理具有长语境跨度能力的语言语言模型。在算法中，从模型开发的网格中的节点和链接通过预先扩展。估计搜索算法使用的启发式算法。此外，可以应用修剪策略来加快搜索速度。

9. 发明授权

US07725314B2 Method and apparatus for constructing a speech filter using estimates of clean speech and noise 有权
标题翻译：用于使用干净的语音和噪声的估计来构造语音滤波器的方法和装置
公开(公告)号：US07725314B2
公开(公告)日：2010-05-25
申请号：US10780177
申请日：2004-02-16
申请人： Jian Wu , James G. Droppo , Li Deng , Alejandro Acero
发明人： Jian Wu , James G. Droppo , Li Deng , Alejandro Acero
IPC分类号： G10L15/20 , G10L21/02 , G10L15/00 , H04B15/00
CPC分类号： G10L21/0208
摘要： A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.
摘要翻译：方法和装置从噪声语音信号中识别干净的语音信号。为此，从噪声语音信号估计干净的语音值和噪声值。然后使用干净的语音值和噪声值来定义滤波器上的增益。噪声语音信号被施加到滤波器以产生干净的语音信号。在一些实施例中，噪声值和清洁语音值用于滤波器增益的分子和分母，分子保证为正。

10. 发明授权

US07565292B2 Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech 有权
标题翻译：流动语言的共同作用动力学和语境相似化减少的量化模型
公开(公告)号：US07565292B2
公开(公告)日：2009-07-21
申请号：US10944262
申请日：2004-09-17
申请人： Li Deng , Alejandro Acero , Dong Yu
发明人： Li Deng , Alejandro Acero , Dong Yu
IPC分类号： G10L13/00
CPC分类号： G10L13/02 , G10L25/15
摘要： A method of identifying a sequence of formant trajectory values is provided in which a sequence of target values are identified for a formant as step functions. The target values and the duration for each segment target for the formant are applied to a finite impulse response filter to form a sequence of formant trajectory values. The parameters of this filter, as well as the duration of the targets for each phone, can be modified to produce many kinds of target undershooting effects in a contextually assimilated manner. The procedure for producing the formant trajectory values does not require any acoustic data from speech.
摘要翻译：提供了一种识别共振峰轨迹值序列的方法，其中以阶跃函数为共振峰识别目标值序列。将目标值和共振峰的每个段目标的持续时间应用于有限脉冲响应滤波器以形成共振峰轨迹值序列。可以修改此过滤器的参数以及每个手机的目标持续时间，以上下文相同的方式产生多种目标下冲效应。用于产生共振峰轨迹值的过程不需要来自语音的任何声学数据。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式