会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA
    • 用于自动语音识别的模型培训从不正确的转录数据
    • US20100318355A1
    • 2010-12-16
    • US12482142
    • 2009-06-10
    • Jinyu LiYifan GongChaojun LiuKaisheng Yao
    • Jinyu LiYifan GongChaojun LiuKaisheng Yao
    • G10L15/06
    • G10L15/063G10L15/065
    • Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
    • 描述了用于训练声学模型的技术和系统。 在一个实施例中,用于训练声学模型的技术包括将包括转录错误的训练数据的语料库划分成N个部分,并且在每个部分上,用增量声学模型和增量语言模型解码语音以产生解码的转录。 该技术可以进一步包括将一对单词之间的沉默插入解码的转录中,并根据每个部分的时间将与发音对应的原始转录与解码的转录对准。 该技术可以进一步包括从具有至少Q个连续匹配对齐字的话语中选择一段,以及使用所选择的段来训练增量声学模型。 然后可以在训练数据的后续部分上使用经过训练的增量声学模型。 描述和要求保护其他实施例。
    • 2. 发明授权
    • Model training for automatic speech recognition from imperfect transcription data
    • 从不完美的转录数据自动语音识别的模型训练
    • US09280969B2
    • 2016-03-08
    • US12482142
    • 2009-06-10
    • Jinyu LiYifan GongChaojun LiuKaisheng Yao
    • Jinyu LiYifan GongChaojun LiuKaisheng Yao
    • G10L15/00G10L15/06G10L15/065
    • G10L15/063G10L15/065
    • Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
    • 描述了用于训练声学模型的技术和系统。 在一个实施例中,用于训练声学模型的技术包括将包括转录错误的训练数据的语料库划分成N个部分,并且在每个部分上,用增量声学模型和增量语言模型解码语音以产生解码的转录。 该技术可以进一步包括将一对单词之间的沉默插入解码的转录中,并根据每个部分的时间将与发音对应的原始转录与解码的转录对准。 该技术可以进一步包括从具有至少Q个连续匹配对齐字的话语中选择一段,以及使用所选择的段来训练增量声学模型。 然后可以在训练数据的后续部分上使用经过训练的增量声学模型。 描述和要求保护其他实施例。
    • 3. 发明授权
    • Subspace speech adaptation
    • 子空间语音适应
    • US08700400B2
    • 2014-04-15
    • US12982401
    • 2010-12-30
    • Daniel PoveyKaisheng YaoYifan Gong
    • Daniel PoveyKaisheng YaoYifan Gong
    • G10L15/06G10L15/00G10L15/14
    • G10L15/065
    • Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    • 可以利用子空间语音适配来促进包含短语的语音的识别。 语音训练数据可以由计算机在语音模型中接收。 可以基于语音训练数据来确定用于预处理语音统计的第一矩阵。 可以确定第二矩阵以表示要识别的语音的基础。 然后可以从第一矩阵和第二矩阵确定一组基矩阵。 然后可以由计算机接收包括短话语的语音测试数据。 计算机然后可以将该组矩阵应用于语音测试数据以产生转录。 转录可能代表短语的语音识别。
    • 5. 发明申请
    • SYSTEM AND METHOD FOR DEVELOPING HIGH ACCURACY ACOUSTIC MODELS BASED ON AN IMPLICIT PHONE-SET DETERMINATION-BASED STATE-TYING TECHNIQUE
    • 基于隐式电话机确定的状态类型技术开发高精度声学模型的系统和方法
    • US20070233481A1
    • 2007-10-04
    • US11278504
    • 2006-04-03
    • Kaisheng Yao
    • Kaisheng Yao
    • G10L15/06
    • G10L15/187G10L15/063
    • A system for, and method of, developing high accuracy acoustic models and a digital signal processor incorporating the same. In one embodiment, the system includes: (1) an acoustic model initializer configured to generate initial acoustic models by seeding with seed monophones, (2) a monophone retrainer associated with the acoustic model initializer and configured to retrain the monophones using a target database, (3) a triphone generator associated with the monophone retrainer and configured to generate seed triphones from the monophones using aligned training data, (4) a triphone retrainer associated with the triphone generator and configured to retrain the triphones using the target database and (5) a triphone clusterer associated with the triphone retrainer and configured to cluster the triphones using a state-tying technique, the triphone retrainer configured to retrain the triphones again using the target database.
    • 用于开发高精度声学模型的系统和方法以及包含该高精度声学模型的数字信号处理器。 在一个实施例中,系统包括:(1)声学模型初始化器,被配置为通过用种子单声道播种来产生初始声学模型,(2)与声学模型初始化器相关联的单声道再培训器并且被配置为使用目标数据库重新训练单声道, (3)与所述单声道再握持器相关联并被配置为使用对准的训练数据从所述单声道生成种子三通的三耳机发生器,(4)与所述三耳机发生器相关联并被配置为使用所述目标数据库重新训练所述三耳机的三耳机再握持器,以及(5) 与所述三耳机再握持器相关联并被配置为使用状态绑定技术来聚集所述三通电话的三耳机群集器,所述三耳机再握持器被配置为使用所述目标数据库重新训练所述三耳机。
    • 6. 发明申请
    • System and method for noisy automatic speech recognition employing joint compensation of additive and convolutive distortions
    • 用于噪声自动语音识别的系统和方法,采用加法和卷积失真的联合补偿
    • US20070033028A1
    • 2007-02-08
    • US11298332
    • 2005-12-09
    • Kaisheng Yao
    • Kaisheng Yao
    • G10L15/20
    • G10L15/20
    • A system for, and method of, noisy automatic speech recognition employing joint compensation of additive and convolutive distortions and a digital signal processor incorporating the system or the method. In one embodiment, the system includes: (1) an additive distortion factor estimator configured to estimate an additive distortion factor, (2) an acoustic model compensator coupled to the additive distortion factor estimator and configured to use estimates of a convolutive distortion factor and the additive distortion factor to compensate acoustic models and recognize a current utterance, (3) an utterance aligner coupled to the acoustic model compensator and configured to align the current utterance using recognition output and (4) a convolutive distortion factor estimator coupled to the utterance aligner and configured to estimate an updated convolutive distortion factor based on the current utterance using differential terms but disregarding log-spectral domain variance terms.
    • 使用加法和卷积失真联合补偿的噪声自动语音识别的系统和方法以及结合该系统或方法的数字信号处理器。 在一个实施例中,系统包括:(1)被配置为估计附加失真因子的加性失真因子估计器,(2)耦合到所述加性失真因子估计器并被配置为使用卷积失真因子的估计的声学模型补偿器, 加法失真因子以补偿声学模型并识别当前语音,(3)耦合到声学模型补偿器并被配置为使用识别输出对准当前语音的发音对准器和(4)耦合到话音对准器的卷积失真因子估计器,以及 被配置为基于使用差分项目的当前语音来估计更新的卷积失真因子,但忽略对数频域域方差项。
    • 8. 发明申请
    • SYSTEM AND METHOD FOR TEXT-TO-PHONEME MAPPING WITH PRIOR KNOWLEDGE
    • 使用先前的知识进行文本到电影映射的系统和方法
    • US20070233490A1
    • 2007-10-04
    • US11278497
    • 2006-04-03
    • Kaisheng Yao
    • Kaisheng Yao
    • G10L13/08
    • G10L13/08
    • A system for, and method of, text-to-phoneme (TTP) mapping and a digital signal processor (DSP) incorporating the system or the method. In one embodiment, the system includes: (1) a letter-to-phoneme (LTP) mapping generator configured to generate an LTP mapping by iteratively aligning a full training set with a set of correctly aligned entries based on statistics of phonemes and letters from the set of correctly aligned entries and redefining the full training set as a union of the set of correctly aligned entries and a set of incorrectly aligned entries created during the aligning and (2) a model trainer configured to update prior probabilities of LTP mappings generated by the LTP generator and evaluate whether the LTP mappings are suitable for training a decision-tree-based pronunciation model (DTPM).
    • 文本到音素(TTP)映射的系统和方法以及包含该系统或方法的数字信号处理器(DSP)。 在一个实施例中,系统包括:(1)字母对音素(LTP)映射生成器,其被配置为通过基于来自所述第一和第二信号的音素和字母的统计信息,将完整的训练集合与一组正确对齐的条目迭代地对齐来生成LTP映射 一组正确对齐的条目,并将完整的训练集合重新定义为正确对齐的条目集合和在对齐期间创建的一组错误对齐的条目的联合,以及(2)模型训练器,被配置为更新先前的LTP映射概率, LTP生成器,并评估LTP映射是否适合于基于决策树的发音模型(DTPM)的训练。
    • 9. 发明申请
    • System and method for combined state- and phone-level and multi-stage phone-level pronunciation adaptation for speaker-independent name dialing
    • 用于与扬声器无关的名称拨号的状态和电话级和多级电话级发音相结合的系统和方法
    • US20070198265A1
    • 2007-08-23
    • US11359973
    • 2006-02-22
    • Kaisheng Yao
    • Kaisheng Yao
    • G10L15/04
    • G10L15/144G10L15/065G10L2015/025
    • A system for, and method of, combined state- and phone-level pronunciation adaptation. One embodiment of the system includes: (1) a state-level pronunciation variation analyzer configured to use an alignment process to compare base forms of words with alternate pronunciations and generate a confusion matrix, (2) a state-level pronunciation adapter associated with the state-level pronunciation variation analyzer and configured to employ the confusion matrix to generate, in plural states, sets of Gaussian mixture components corresponding to alternative pronunciation realizations and enlarge the sets by tying the Gaussian mixture components across the states based on distances among the Gaussian mixture components and (3) a phone-level pronunciation adapter associated with the state-level pronunciation adapter and configured to employ phone-level re-write rules to generate multiple pronunciation entries. The phone-level pronunciation adapter may be embodied in multiple stages.
    • 一种国家和电话级语音合并的系统和方法。 该系统的一个实施例包括:(1)状态级发音变化分析器,被配置为使用对准过程来比较基本形式的单词与替代发音,并产生混淆矩阵,(2)与 状态级发音变化分析器,并且被配置为使用混淆矩阵在多个状态中生成对应于替代发音实现的高斯混合分量集合,并且基于高斯混合物之间的距离,通过将状态中的高斯混合分量捆绑在一起来扩大集合 组件和(3)与状态级发音适配器相关联的电话级发音适配器,并且被配置为使用电话级重写规则来生成多个发音条目。 电话级发音适配器可以以多个阶段来体现。
    • 10. 发明授权
    • Weighted sequential variance adaptation with prior knowledge for noise robust speech recognition
    • 加权顺序方差适应与噪声鲁棒语音识别的先验知识
    • US08180635B2
    • 2012-05-15
    • US12347504
    • 2008-12-31
    • Xiaodong CuiKaisheng Yao
    • Xiaodong CuiKaisheng Yao
    • G10L15/20
    • G10L15/20
    • A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation. The estimated 2-order polynomial represents a prior knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.
    • 提供了一种用于适应用于自动语音识别的声学模型的方法。 该方法包括估计语音信号的一部分中的噪声,使用估计的2阶多项式和噪声估计来确定第一估计方差缩放向量。 估计的2阶多项式表示方差缩放矢量对噪声的依赖性的先验知识,使用来自语音信号的先前部分的统计确定第二估计方差缩放矢量,使用第一估计方差缩放矢量确定方差缩放因子 和第二估计方差缩放向量,并使用方差缩放因子来适应声学模型。