会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Feneme-based Markov models for words
    • 基于Feneme的马尔可夫模型的词
    • US5165007A
    • 1992-11-17
    • US366231
    • 1989-06-12
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • G10L15/02G10L15/06G10L15/14
    • G10L15/142G10L2015/0631
    • In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability of the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.
    • 在一种语音识别系统中,公开了用基于标签的马尔可夫模型对词进行建模的装置和方法。 所述建模包括:将对应于词汇表中的单词的第一语音输入输入到将每个口语单词转换成标准标签序列的声学处理器,其中每个标准标签对应于可分配到时间间隔的声音类型; 将每个标准标签表示为具有多个状态的概率模型,至少一个从状态到状态的转变,以及在某些转换时的至少一个可设置的输出概率; 将选定的声音输入输入到将所选择的声音输入转换成个性化标签的声学处理器,每个个性化标签对应于分配给一段时间的声音类型; 并将每个输出概率设置为由给定模型表示的标准标签的概率,该给定模型在给定模型中的给定转换处产生特定个性化标签。 本发明解决了在语音识别系统中简单和自动地生成单词模型的问题。
    • 2. 发明授权
    • Speech recognition employing a set of Markov models that includes Markov
models representing transitions to and from silence
    • 语音识别采用一组马尔可夫模型,其中包括表示从沉默转换到沉默的马尔可夫模型
    • US4977599A
    • 1990-12-11
    • US289447
    • 1988-12-15
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • G10L15/02G10L15/06G10L15/14
    • G10L15/02G10L15/142G10L2015/0631
    • Apparatus and method for constructing word baseforms which can be matched against a string of generated acoustic labels. A set of phonetic phone machines are formed, wherein each phone machine has (i) a plurality of states, (ii) a plurality of transitions each of which extends from a state to a state, (iii) a stored probability for each transition, and (iv) stored label output probabilities, each label output probability corresponding to the probability of each phone machine producing a corresponding label. The set of phonetic machines is formed to include a subset of onset phone machines. The stored probabilities of each onset phone macine correspond to at least one phonetic element being uttered at the beginning of a speech segment. The set of phonetic machines is formed to include a subset of trailing phone machines. The stored probabilities of each trailing phone machine correspond to at least one single phonetic element being uttered at the end of a speech segment. Word baseforms are constructed by concatenating phone machines selected from the set.
    • 用于构建可与一串生成的声学标签匹配的字基形式的装置和方法。 形成一组语音电话机,其中每个电话机具有(i)多个状态,(ii)多个转换,每个转换从状态延伸到状态,(iii)每个转换的存储概率, 和(iv)存储的标签输出概率,每个标签输出概率对应于每个电话机产生相应标签的概率。 语音机的组合形成为包括起动电话机的一个子集。 每个起始电话机的存储概率对应于在语音段开始时发出的至少一个语音元素。 该组语音机器被形成为包括拖尾电话机的子集。 每个拖尾电话机的存储概率对应于在语音段结束时发出的至少一个单个语音元素。 字基础是通过连接从集合中选择的电话机构成的。
    • 4. 发明授权
    • Constructing Markov models of words from multiple utterances
    • 从多个话语构建马可夫模型
    • US4759068A
    • 1988-07-19
    • US738933
    • 1985-05-29
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • Lalit R. BahlPeter V. DeSouzaRobert L. MercerMichael A. Picheny
    • G10L15/14G10L5/00
    • G10L15/14
    • Speech recognition is improved by splitting each feneme string at a consistent point into a left portion and a right portion. The present invention addresses the problem of constructing fenemic baseforms which take into account variations in pronunciation of words from one utterance thereof to another. Specifically, the invention relates to a method of constructing a fenemic baseform for a word in a vocabulary of word segments including the steps of: (a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P.sub.1 for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P.sub.1 P.sub.2 or P.sub.2 P.sub.1 for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding to the order of the feneme substrings to which they correspond.
    • 通过将一致点处的每个非对称串分成左部分和右部分来改善语音识别。 本发明解决了考虑到从一个发音到另一个发音的词的发音的变化的构建快速基本形式的问题。 具体地说,本发明涉及一种在词段词汇中构建单词的构象基础形式的方法,包括以下步骤:(a)将单词的多个话语转换成各自的拼写字符串; (b)定义一套美式马尔可夫模型电话机; (c)确定最好的单机P1用于产生多个无线串; (d)确定形式为P1P2或P2P1的最佳两个手机基本形式,用于产生多个拼接线; (e)将最佳的两个手机基本格局对准每个拼音字符串; (f)将每个拼音串分成左侧部分和右侧部分,左侧部分对应于两个电话基本形式的第一电话机和对应于两个电话基本形式的第二电话机的右部分; (g)将每个左部分识别为左子串,每个右部分作为右子串; (h)以与多重话语对应的组合字符串的集合相同的方式处理左子串和右子集的集合,包括当单个电话基础具有更高的子字符串时进一步分割子串的另一步骤 生成子串的概率比最好的两个手机基本形式; 和(k)以对应于它们对应的无限子串的顺序的顺序连接非分开的单个电话。
    • 8. 发明授权
    • Fast algorithm for deriving acoustic prototypes for automatic speech
recognition
    • 用于自动语音识别的声学原型的快速算法
    • US5276766A
    • 1994-01-04
    • US730714
    • 1991-07-16
    • Lalit R. BahlJerome R. BellegardaPeter V. DeSouzaDavid NahamooMichael A. Picheny
    • Lalit R. BahlJerome R. BellegardaPeter V. DeSouzaDavid NahamooMichael A. Picheny
    • G10L19/00G10L15/02G10L15/06G10L9/04
    • G10L15/063
    • An apparatus for generating a set of acoustic prototype signals for encoding speech includes a memory for storing a training script model comprising a series of word-segment models. Each word-segment model comprises a series of elementary models. An acoustic measure is provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. An acoustic matcher is provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises a cluster processor for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes a memory for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
    • 一种用于生成用于编码语音的声原型信号的集合的装置包括用于存储包括一系列字段模型的训练脚本模型的存储器。 每个单词段模型包括一系列基本模型。 提供了一种声学测量,用于在一系列时间间隔的每一个期间测量训练脚本的发音的至少一个特征的值,以产生表示发音的特征值的一系列特征向量信号。 提供声学匹配器用于估计通过训练脚本模型的至少一个路径,其将产生整个测量的特征向量信号的一系列。 从估计的路径,估计将产生每个特征向量信号的训练脚本模型中的基本模型。 该装置还包括用于将特征向量信号聚类成多个聚类的聚类处理器。 群集中的每个特征向量信号对应于单个单词段模型中单个位置中的单个基本模型。 每个聚类信号具有等于信号中所有特征向量的特征值的平均值的聚类值。 最后,该装置包括用于存储多个原型矢量信号的存储器。 每个原型矢量信号对应于基本模型,具有标识符,并且包括至少两个分区值。 分区值等于对应于基本模型的一个或多个聚类信号的聚类值的组合。