专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

31. 发明授权

US07409346B2 Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction 有权
标题翻译：使用语音合成和还原的双向目标滤波模型进行语音识别的两阶段实现
公开(公告)号：US07409346B2
公开(公告)日：2008-08-05
申请号：US11069474
申请日：2005-03-01
申请人： Alejandro Acero , Dong Yu , Li Deng
发明人： Alejandro Acero , Dong Yu , Li Deng
IPC分类号： G10L15/10
CPC分类号： G10L15/02 , G10L25/15 , G10L25/24 , G10L2015/025
摘要： A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.
摘要翻译：用新的两阶段实现来描述语音合成和简化的结构化生成模型。在第一阶段，使用电话序列中共振目标的先前信息产生共振峰或声道共振（VTR）的动力学。具有有限脉冲响应（FIR）的双向时间滤波作为FIR滤波器的输入应用于分段目标序列。在第二阶段，基于FIR滤波的VTR目标，分析地预测语音cepstra的动力学。这两个阶段的组合系统因此产生相关和因果相关的VTR和倒谱动力学，其中语音减少在隐藏共振空间中明确表示，并且隐含地在观察到的倒频谱空间中。组合系统还给出了电话序列的声学观察概率。使用这种概率，可以根据它们各自的概率值对不同的电话序列进行比较和排序。这样就允许使用模型进行语音识别。

32. 发明申请

US20080177547A1 Integrated speech recognition and semantic classification 有权
标题翻译：综合语音识别和语义分类
公开(公告)号：US20080177547A1
公开(公告)日：2008-07-24
申请号：US11655703
申请日：2007-01-19
申请人： Sibel Yaman , Li Deng , Dong Yu , Ye-Yi Wang , Alejandro Acero
发明人： Sibel Yaman , Li Deng , Dong Yu , Ye-Yi Wang , Alejandro Acero
IPC分类号： G10L15/18
CPC分类号： G10L15/1815
摘要： A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.
摘要翻译：一种新颖的系统集成了语音识别和语义分类，从而在训练语言模型和语义分类模型时，可以考虑接受讲话语音的语音识别器中的声学分数。例如，可以定义联合关联分数，其表示声学信号的语义类别和单词序列的对应关系。联合关联分数可以包括参数，例如声信号的信号到类建模的加权参数，语言模型参数和分数，以及声学模型参数和分数。可以修改参数以相对于具有目标语义类的竞争者词序列的联合关联分数来提高具有目标语义类别的目标词序列的联合关联分数。可以设计参数，使得训练数据中的语义分类误差最小化。

33. 发明授权

US07254536B2 Method of noise reduction using correction and scaling vectors with partitioning of the acoustic space in the domain of noisy speech 有权
标题翻译：使用校正和缩放矢量进行噪声降低的方法，其中噪声语音领域的声学空间分割
公开(公告)号：US07254536B2
公开(公告)日：2007-08-07
申请号：US11059036
申请日：2005-02-16
申请人： Li Deng , Xuedong Huang , Alejandro Acero
发明人： Li Deng , Xuedong Huang , Alejandro Acero
IPC分类号： G10L21/02
CPC分类号： G10L21/0208
摘要： A method and apparatus are provided for reducing noise in a training signal and/or test signal. The noise reduction technique uses a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors from these channel signals, a collection of noise correction and scaling vectors is determined. When a feature vector of a noisy pattern signal is later received, it is multiplied by the best scaling vector for that feature vector and the best correction vector is added to the product to produce a noise reduced feature vector. Under one embodiment, the best scaling and correction vectors are identified by choosing an optimal mixture component for the noisy feature vector. The optimal mixture component being selected based on a distribution of noisy channel feature vectors associated with each mixture component.
摘要翻译：提供了一种用于减少训练信号和/或测试信号中的噪声的方法和装置。噪声降低技术使用由两个信道信号形成的立体声信号，每个信道包含相同的模式信号。一个通道信号是“干净的”，另一个包括加性噪声。使用来自这些信道信号的特征向量，确定噪声校正和缩放向量的集合。当稍后接收到噪声模式信号的特征向量时，将其乘以该特征向量的最佳缩放向量，并将最佳校正向量加到乘积以产生降噪特征向量。在一个实施例中，通过为噪声特征向量选择最佳混合分量来识别最佳缩放和校正矢量。基于与每个混合物组分相关联的噪声通道特征向量的分布来选择最佳混合物组分。

34. 发明申请

US20060100862A1 Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories 有权
公开(公告)号：US20060100862A1
公开(公告)日：2006-05-11
申请号：US11071904
申请日：2005-03-01
申请人： Li Deng , Alejandro Acero , Dong Yu , Xiang Li
发明人： Li Deng , Alejandro Acero , Dong Yu , Xiang Li
IPC分类号： G10L11/04
CPC分类号： G10L15/02 , G10L2015/025
摘要： A method of producing at least one possible sequence of vocal tract resonance (VTR) for a fixed sequence of phonetic units, and producing the acoustic observation probability by integrating over such distributions is provided. The method includes identifying a sequence of target distributions for a VTR sequence corresponding to a phone sequence with a given segmentation. The sequence of target distributions is applied to a finite impulse response filter to produce distributions for possible VTR trajectories. Then these distributions are applied to a linearized nonlinear function to produce the acoustic observation probability for the given sequence of phonetic units. This acoustic observation probability is used for phonetic recognition.

35. 发明申请

US20050256706A1 Removing noise from feature vectors 有权
标题翻译：从特征向量中消除噪声
公开(公告)号：US20050256706A1
公开(公告)日：2005-11-17
申请号：US11185522
申请日：2005-07-20
申请人： Brendan Frey , Alejandro Acero , Li Deng
发明人： Brendan Frey , Alejandro Acero , Li Deng
IPC分类号： G10L15/02 , G10L15/20 , G10L21/00
CPC分类号： G10L15/02 , G10L15/20
摘要： A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. One aspect of the invention includes using an iterative approach to identify the clean signal feature vector. Another aspect of the invention includes using the variance of a set of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.
摘要翻译：提供了一种用于从噪声信号特征向量识别干净信号特征向量的方法和计算机可读介质。本发明的一个方面包括使用迭代方法来识别清洁信号特征向量。本发明的另一方面包括当识别清洁信号特征向量时使用一组噪声特征向量和/或信道失真特征向量的方差。

36. 发明申请

US20050114134A1 Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximations 审中-公开
标题翻译：使用分段线性近似的连续值声道共振跟踪的方法和装置
公开(公告)号：US20050114134A1
公开(公告)日：2005-05-26
申请号：US10723995
申请日：2003-11-26
申请人： Li Deng , Hagai Attias , Alejandro Acero , Leo Lee
发明人： Li Deng , Hagai Attias , Alejandro Acero , Leo Lee
IPC分类号： G10L15/10 , G10L11/00 , G10L15/02 , G10L15/14 , G10L15/28 , G10L19/06
CPC分类号： G10L25/48 , G10L25/15
摘要： A method and apparatus tracks vocal tract resonance components, including both frequencies and bandwidths, in a speech signal. The components are tracked by defining a state equation that is linear with respect to a past vocal tract resonance vector and that predicts a current vocal tract resonance vector. An observation equation is also defined that is linear with respect to a current vocal tract resonance vector and that predicts at least one component of an observation vector. The state equation, the observation equation, and a sequence of observation vectors are used to identify a sequence of vocal tract resonance vectors using Kalman filter algorithm. Under one embodiment, the observation equation is defined based on a piecewise linear approximation to a non-linear function. The parameters of the linear approximation are selected based on pre-defined regions, which are determined from a crude estimate of a vocal tract resonance vector.
摘要翻译：一种方法和装置在语音信号中跟踪声道共振分量，包括频率和频带两者。通过定义相对于过去声道共振矢量线性的状态方程并且预测当前声道共振矢量来跟踪组件。还定义了相对于当前声道共振矢量是线性的并且预测观察矢量的至少一个分量的观察方程。状态方程，观察方程和观察矢量序列用于使用卡尔曼滤波算法识别声道共振矢量序列。在一个实施例中，基于对非线性函数的分段线性近似来定义观察方程。基于由声道共振矢量的粗略估计确定的预定义区域来选择线性近似的参数。

37. 发明授权

US08489529B2 Deep convex network with joint use of nonlinear random projection, Restricted Boltzmann Machine and batch-based parallelizable optimization 有权
标题翻译：联合使用非线性随机投影的深凸网络，限制玻尔兹曼机器和基于批量的可并行化优化
公开(公告)号：US08489529B2
公开(公告)日：2013-07-16
申请号：US13077978
申请日：2011-03-31
申请人： Li Deng , Dong Yu , Alejandro Acero
发明人： Li Deng , Dong Yu , Alejandro Acero
IPC分类号： G06N5/00
CPC分类号： G06N3/08 , G06N3/02 , G06N3/04 , G06N3/0454
摘要： A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
摘要翻译：本文公开了一种方法，其包括使处理器访问被保留在计算机可读介质中的称为深凸网络的深层结构的分层或层次模型的动作，其中深层结构模型包括多个具有分配给它的权重。该分层模型可以产生作为分数的输出，以与隐藏的马尔可夫模型和语言模型分数中的状态之间的转移概率相结合，以形成完整的语音识别器。该方法联合使用非线性随机投影和RBM权重，并将较低模块的输出与原始数据叠加以建立其立即更高的模块。执行基于批次的凸优化来学习深凸网络权重的一部分，使其适合于并行计算以完成训练。该方法还可以包括使用基于序列而不是一组不相关帧的优化准则共同基本优化深层结构模型的权重，转移概率和语言模型分数的动作。

38. 发明授权

US08423364B2 Generic framework for large-margin MCE training in speech recognition 有权
标题翻译：语言识别中大面积MCE培训的通用框架
公开(公告)号：US08423364B2
公开(公告)日：2013-04-16
申请号：US11708440
申请日：2007-02-20
申请人： Dong Yu , Alejandro Acero , Li Deng , Xiaodong He
发明人： Dong Yu , Alejandro Acero , Li Deng , Xiaodong He
IPC分类号： G10L15/14 , G10L15/00 , G10L15/06
CPC分类号： G10L15/063 , G10L2015/0631
摘要： A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
摘要翻译：公开了一种用于训练声学模型的方法和装置。训练语料库被访问并转换成初始声学模型。对于给定初始声学模型的每个令牌，分数计算分别为正确的类和竞争类。此外，针对每个训练令牌计算样本自适应窗口带宽。从计算出的分数和采样自适应窗口带宽值，根据损失函数计算损失值。可以从贝叶斯风险最小化观点导出的损失函数可以包括移动判定边界的边距值，使得靠近判定边界的正确令牌的令牌到边界的距离最大化。边距可以是固定边距，也可以作为算法迭代的函数单调变化。基于计算的损失值更新声学模型。可以重复该过程，直到满足经验收敛。

39. 发明授权

US08180637B2 High performance HMM adaptation with joint compensation of additive and convolutive distortions 有权
标题翻译：高性能HMM适应与加法和卷积扭曲的联合补偿
公开(公告)号：US08180637B2
公开(公告)日：2012-05-15
申请号：US11949044
申请日：2007-12-03
申请人： Dong Yu , Li Deng , Alejandro Acero , Yifan Gong , Jinyu Li
发明人： Dong Yu , Li Deng , Alejandro Acero , Yifan Gong , Jinyu Li
IPC分类号： G10L15/00 , G10L15/20 , G10L17/00
CPC分类号： G10L15/20 , G10L15/142
摘要： A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.
摘要翻译：讨论了补偿施加到表示话语的信号的加法和卷积失真的方法。该方法包括接收信号并初始化噪声平均和信道均值向量。计算或更新高斯依赖矩阵和隐马尔可夫模型（HMM）参数以考虑来自信道平均向量的噪声平均向量或卷积失真的加性噪声。 HMM参数通过使用先前计算出的HMM参数解码话音并根据解码期间接收到的数据调整高斯相关矩阵和HMM参数进行调整。适应的HMM参数被应用于解码输入的话语并提供话语的转录。

40. 发明授权

US07877256B2 Time synchronous decoding for long-span hidden trajectory model 有权
标题翻译：长跨隐藏轨迹模型的时间同步解码
公开(公告)号：US07877256B2
公开(公告)日：2011-01-25
申请号：US11356905
申请日：2006-02-17
申请人： Xiaolong Li , Li Deng , Dong Yu , Alejandro Acero
发明人： Xiaolong Li , Li Deng , Dong Yu , Alejandro Acero
IPC分类号： G10L15/14
CPC分类号： G10L15/08
摘要： A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.
摘要翻译：开发了一种时间同步的格格约束搜索算法，用于处理具有长语境跨度能力的语言语言模型。在算法中，假设被表示为包括当前帧，先前帧和未来帧的指示的迹线。每个帧可以包括相关联的语言单元，例如从电话派生的电话或单元。此外，可以应用修剪策略来加快搜索速度。此外，开发了文字重组方法以加速计算。这些方法可以有效地处理指数级增加的搜索空间。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式