会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 43. 发明申请
    • Noise Suppressor for Robust Speech Recognition
    • 噪声抑制器用于强大的语音识别
    • US20100153104A1
    • 2010-06-17
    • US12335558
    • 2008-12-16
    • Dong YuLi DengYifan GongJian WuAlejandro Acero
    • Dong YuLi DengYifan GongJian WuAlejandro Acero
    • G10L15/20
    • G10L21/0208G10L15/20
    • Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.
    • 描述了通常用于语音输入的噪声降低技术,其中除了信噪比(SNR)之外,基于与该帧相关联的噪声电平来确定用于帧的噪声抑制相关增益值。 在一个实现中,降噪机制基于最小均方误差,Mel-frequency cepstra降噪技术。 设置高增益值(例如一个),以在噪声电平低于阈值低电平时实现很少或没有噪声抑制,以及设置或计算的低增益值,以实现高于阈值高噪声电平的大噪声抑制。 使用噪声功率相关函数,例如对数线性插值来计算阈值之间的增益。 可以通过基于先前帧的增益值修改增益值来执行平滑化。 还描述了通过步进自适应识别学习算法在降噪中使用的学习参数。
    • 44. 发明申请
    • PIECEWISE-BASED VARIABLE -PARAMETER HIDDEN MARKOV MODELS AND THE TRAINING THEREOF
    • 基于改进的可变参数隐藏式MARKOV模型及其训练
    • US20100070279A1
    • 2010-03-18
    • US12211114
    • 2008-09-16
    • Dong YuLi DengYifan GongAlejandro Acero
    • Dong YuLi DengYifan GongAlejandro Acero
    • G10L15/14
    • G10L15/144
    • A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter μ and a variance parameter Σ. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter μ and the variance parameter Σ of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
    • 语音识别系统使用高斯混合可变参数隐马尔可夫模型(VPHMM)来识别许多不同条件下的语音。 VPHMM的每个高斯混合分量的特征在于平均参数μ和方差参数&Sgr。 这些高斯参数中的每一个作为至少一个环境调节参数的函数而变化,例如但不限于瞬时信噪比(SNR)。 高斯参数随环境条件参数变化的方式可以近似为分段函数,如三次样条函数。 此外,识别系统制定均值参数μ和方差参数&Sgr; 每个高斯混合分量以有效的形式适应使用歧视性训练和参数共享。 执行参数共享,以便通过实际可行的训练数据量可以有效地减少VPHMM中非常大量的参数。
    • 48. 发明授权
    • Speaker adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation
    • 演讲者自适应学习的共振目标在隐藏轨迹模型的语音合成
    • US07519531B2
    • 2009-04-14
    • US11093833
    • 2005-03-30
    • Alejandro AceroDong YuLi Deng
    • Alejandro AceroDong YuLi Deng
    • G10L19/06
    • G10L15/07G10L2015/0638
    • A computer-implemented method is provided for training a hidden trajectory model, of a speech recognition system, which generates Vocal Tract Resonance (VTR) targets. The method includes obtaining generic VTR target parameters corresponding to a generic speaker used by a target selector to generate VTR target sequences. The generic VTR target parameters are scaled for a particular speaker using a speaker-dependent scaling factor for the particular speaker to generate speaker-adaptive VTR target parameters. This scaling is performed for both the training data and the test data, and for the training data, the scaling is performed iteratively with the process of obtaining the generic targets. The computation of the scaling factor makes use of the results of a VTR tracker. The speaker-adaptive VTR target parameters for the particular speaker are then stored in order to configure the hidden trajectory model to perform speech recognition for the particular speaker using the speaker-adaptive VTR target parameters.
    • 提供了一种计算机实现的方法,用于训练产生声音轨道共振(VTR)目标的语音识别系统的隐藏轨迹模型。 该方法包括获得对应于由目标选择器使用的通用扬声器生成VTR目标序列的通用VTR目标参数。 使用与特定扬声器相关的扬声器相关的缩放因子来为特定扬声器对通用VTR目标参数进行缩放以产生说话者自适应VTR目标参数。 对训练数据和测试数据进行该缩放,对于训练数据,通过获得通用目标的过程迭代地执行缩放。 缩放因子的计算使用VTR跟踪器的结果。 然后存储用于特定扬声器的扬声器自适应VTR目标参数,以便配置隐藏轨迹模型,以使用扬声器自适应VTR目标参数为特定扬声器执行语音识别。
    • 49. 发明申请
    • METHOD OF PATTERN RECOGNITION USING NOISE REDUCTION UNCERTAINTY
    • 使用噪声减少不确定度的图案识别方法
    • US20080281591A1
    • 2008-11-13
    • US12180260
    • 2008-07-25
    • James G. DroppoAlejandro AceroLi Deng
    • James G. DroppoAlejandro AceroLi Deng
    • G10L15/20
    • G10L21/0208G10L15/20
    • A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.
    • 提供了一种在模式识别期间使用噪声去除处理的不确定性的方法和装置。 特别地,从噪声信号的一部分的表示中去除噪声以产生清洁信号的表示。 同时,计算与噪声去除有关的不确定性,并与清除信号的表示一起使用以修改识别系统中语音状态的概率。 在特定实施例中,不确定性用于通过将每个高斯分布中的方差增加等于在模式识别任务中对语音状态序列进行解码所使用的清除信号的估计方差的量来修改概率分布。
    • 50. 发明申请
    • Generic framework for large-margin MCE training in speech recognition
    • 语言识别中大面积MCE培训的通用框架
    • US20080201139A1
    • 2008-08-21
    • US11708440
    • 2007-02-20
    • Dong YuAlejandro AceroLi DengXiaodong He
    • Dong YuAlejandro AceroLi DengXiaodong He
    • G10L15/00
    • G10L15/063G10L2015/0631
    • A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
    • 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定初始声学模型的每个令牌,分数计算分别为正确的类和竞争类。 此外,针对每个训练令牌计算样本自适应窗口带宽。 从计算出的分数和采样自适应窗口带宽值,根据损失函数计算损失值。 可以从贝叶斯风险最小化观点导出的损失函数可以包括移动判定边界的边距值,使得靠近判定边界的正确令牌的令牌到边界的距离最大化。 边距可以是固定边距,也可以作为算法迭代的函数单调变化。 基于计算的损失值更新声学模型。 可以重复该过程,直到满足经验收敛。