专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US4748670A Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor 失效
标题翻译：用于由声学处理器产生的标签确定可能的字序列的装置和方法
公开(公告)号：US4748670A
公开(公告)日：1988-05-31
申请号：US738911
申请日：1985-05-29
申请人： Lalit R. Bahl , Frederick Jelinek
发明人： Lalit R. Bahl , Frederick Jelinek
IPC分类号： G10L15/14 , G10L5/00
CPC分类号： G10L15/14
摘要： Continuous speech recognition is improved by use of a known vocabulary and context probabilities. First, the unknown utterance is analyzed as a sequence of phonemes, then each phoneme labelled to form a string of labels. The shortest label interval which is recognized as a word is assigned a storage stack where similar-sounding candidate words are stored. Multiple stack decoding, and liklihood envelope criteria for word path extension decisions, are further features of the system.
摘要翻译：通过使用已知的词汇和上下文概率来改善连续语音识别。首先，将未知话语分析为一系列音素，然后每个音素被标记以形成一串标签。被识别为字的最短标签间隔被分配存储相似的候选词的存储堆栈。多重堆栈解码和词路径扩展决策的可行包络标准是系统的进一步特征。

2. 发明授权

US5455889A Labelling speech using context-dependent acoustic prototypes 失效
标题翻译：使用上下文相关的声学原型标注语音
公开(公告)号：US5455889A
公开(公告)日：1995-10-03
申请号：US14966
申请日：1993-02-08
申请人： Lalit R. Bahl , Peter de Souza , P. S. Gopalakrishnan , Michael A. Picheny
发明人： Lalit R. Bahl , Peter de Souza , P. S. Gopalakrishnan , Michael A. Picheny
IPC分类号： G10L15/10 , G10L15/02 , G10L15/14 , G10L15/18 , G10L15/28 , G10L5/06 , G10L9/00
CPC分类号： G10L15/142
摘要： The present invention relates to labelling of speech in a context-dependent speech recognition system. When labelling speech using context-dependent prototypes the phone context of a frame of speech needs to be aligned with the appropriate acoustic parameter vector. Since aligning a large amount of data is difficult if based upon arc ranks, the present invention aligns the data using context-independent acoustic prototypes. The phonetic context of each phone of the data is known. Therefore after the alignment step the acoustic parameter vectors are tagged with a corresponding phonetic context. Context-dependent prototype vectors exists for each label. For all labels the context-dependent prototype vectors having the same phonetic context as the tagged acoustic parameter vector are determined. For each label the probability of achieving the tagged acoustic parameter vector is determined given each of the context-dependent label prototype vectors having the same phonetic context as the tagged acoustic parameter vector. The label with the highest probability is associated with the context-dependent acoustic parameter vector.
摘要翻译：本发明涉及在上下文相关语音识别系统中对语音的标注。当使用上下文相关原型标注语音时，语音帧的电话上下文需要与适当的声学参数向量对准。由于如果基于弧级排列大量的数据是很困难的，本发明使用与上下文无关的声学原型进行对准数据。数据的每个电话的语音语境是已知的。因此，在对准步骤之后，声学参数矢量用相应的语音上下文标记。每个标签都存在与上下文相关的原型向量。对于所有标签，确定与标记的声学参数矢量具有相同语音上下文的上下文相关原型矢量。对于每个标签，确定具有与标记的声学参数矢量相同的语音上下文的上下文相关标签原型矢量中的每个标签声学参数矢量的概率。具有最高概率的标签与上下文相关的声学参数矢量相关联。

3. 发明授权

US5182773A Speaker-independent label coding apparatus 失效
标题翻译：扬声器独立标签编码设备
公开(公告)号：US5182773A
公开(公告)日：1993-01-26
申请号：US673810
申请日：1991-03-22
申请人： Lalit R. Bahl , Michael A. Picheny , David Nahamoo , Peter V. de Souza
发明人： Lalit R. Bahl , Michael A. Picheny , David Nahamoo , Peter V. de Souza
IPC分类号： G10L19/00 , G10L15/02 , G10L19/02 , H03M7/30
CPC分类号： H03M7/3082 , G10L19/038
摘要： The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes. The class of sound whose group of prototypes, or at least one of the prototypes, whose combined value most closely matches the value of the feature value of the feature vector is deemed to be the class corresponding to the feature vector. The feature vector is then labeled with the identifier associated with that class.

4. 发明授权

US5165007A Feneme-based Markov models for words 失效
标题翻译：基于Feneme的马尔可夫模型的词
公开(公告)号：US5165007A
公开(公告)日：1992-11-17
申请号：US366231
申请日：1989-06-12
申请人： Lalit R. Bahl , Peter V. DeSouza , Robert L. Mercer , Michael A. Picheny
发明人： Lalit R. Bahl , Peter V. DeSouza , Robert L. Mercer , Michael A. Picheny
IPC分类号： G10L15/02 , G10L15/06 , G10L15/14
CPC分类号： G10L15/142 , G10L2015/0631
摘要： In a speech recognition system, apparatus and method for modelling words with label-based Markov models is disclosed. The modelling includes: entering a first speech input, corresponding to words in a vocabulary, into an acoustic processor which converts each spoken word into a sequence of standard labels, where each standard label corresponds to a sound type assignable to an interval of time; representing each standard label as a probabilistic model which has a plurality of states, at least one transition from a state to a state, and at least one settable output probability at some transitions; entering selected acoustic inputs into an acoustic processor which converts the selected acoustic inputs into personalized labels, each personalized label corresponding to a sound type assigned to an interval of time; and setting each output probability as the probability of the standard label represented by a given model producing a particular personalized label at a given transition in the given model. The present invention addresses the problem of generating models of words simply and automatically in a speech recognition system.
摘要翻译：在一种语音识别系统中，公开了用基于标签的马尔可夫模型对词进行建模的装置和方法。所述建模包括：将对应于词汇表中的单词的第一语音输入输入到将每个口语单词转换成标准标签序列的声学处理器，其中每个标准标签对应于可分配到时间间隔的声音类型; 将每个标准标签表示为具有多个状态的概率模型，至少一个从状态到状态的转变，以及在某些转换时的至少一个可设置的输出概率; 将选定的声音输入输入到将所选择的声音输入转换成个性化标签的声学处理器，每个个性化标签对应于分配给一段时间的声音类型; 并将每个输出概率设置为由给定模型表示的标准标签的概率，该给定模型在给定模型中的给定转换处产生特定个性化标签。本发明解决了在语音识别系统中简单和自动地生成单词模型的问题。

5. 发明授权

US4852173A Design and construction of a binary-tree system for language modelling 失效
标题翻译：用于语言建模的二叉树系统的设计和构造
公开(公告)号：US4852173A
公开(公告)日：1989-07-25
申请号：US114892
申请日：1987-10-29
申请人： Lalit R. Bahl , Peter F. Brown , Peter V. deSouza , Robert L. Mercer
发明人： Lalit R. Bahl , Peter F. Brown , Peter V. deSouza , Robert L. Mercer
IPC分类号： G10L11/00 , G06F15/18 , G06F17/18 , G10L15/00 , G10L15/10 , G10L15/18
CPC分类号： G06N99/005 , G06F17/18 , G10L15/00
摘要： In order to determine a next event based upon available data, a binary decision tree is constructed having true or false questions at each node and a probability distribution of the unknown next event based upon available data at each leaf. Starting at the root of the tree, the construction process proceeds from node-to-node towards a leaf by answering the question at each node encountered and following either the true or false path depending upon the answer. The questions are phrased in terms of the available data and are designed to provide as much information as possible about the next unknown event. The process is particularly useful in speech recognition when the next word to be spoken is determined on the basis of the previously spoken words.
摘要翻译：为了基于可用数据确定下一个事件，构建在每个节点处具有真或假问题的二进制决策树，以及基于每个叶片处的可用数据的未知下一事件的概率分布。从树的根开始，构建过程通过回答所遇到的每个节点的问题，并根据答案遵循真实或错误的路径，从节点到节点进行到叶。这些问题是根据可用数据编写的，旨在为下一个未知事件提供尽可能多的信息。当基于先前说出的单词确定要说出的下一个单词时，该过程特别有用。

6. 发明授权

US06377921B1 Identifying mismatches between assumed and actual pronunciations of words 失效
标题翻译：识别假设和实际发音之间的不匹配
公开(公告)号：US06377921B1
公开(公告)日：2002-04-23
申请号：US09105763
申请日：1998-06-26
申请人： Lalit R. Bahl , Mukund Padmanabhan
发明人： Lalit R. Bahl , Mukund Padmanabhan
IPC分类号： G10L1506
CPC分类号： G10L15/063 , G10L2015/0631
摘要： A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.
摘要翻译：一种识别声学数据与相应转录之间的错配的方法，所述转录以基本单位表示，包括以下步骤：将声学数据与相应转录对准; 计算相对于转录的声学数据中的基本单位的每个实例的概率分数; 为每个基本单位生成分配; 基于阈值将每个基本单元的分布中的特定分数范围对应的基本单元的实例标记为不匹配; 并纠正错配。

7. 发明授权

US5497447A Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors 失效
标题翻译：语音编码装置具有通过绑定到基本模型并围绕参考矢量聚类而生成的声学原型矢量
公开(公告)号：US5497447A
公开(公告)日：1996-03-05
申请号：US28028
申请日：1993-03-08
申请人： Lalit R. Bahl , Ponani S. Gopalakrishnan , Michael A. Picheny , Peter D. De Souza
发明人： Lalit R. Bahl , Ponani S. Gopalakrishnan , Michael A. Picheny , Peter D. De Souza
IPC分类号： G10L19/00 , G10L15/02 , G10L15/06 , G10L9/00
CPC分类号： G10L15/063
摘要： A speech coding apparatus in which measured acoustic feature vectors are each represented by the best matched prototype vector. The prototype vectors are generated by storing a model of a training script comprising a series of elementary models. The value of at least one feature of a training utterance of the training script is measured over each of a series of successive time intervals to produce a series of training feature vectors. A first set of training feature vectors corresponding to a first elementary model in the training script is identified. The feature value of each training feature vector signal in the first set is compared to the parameter value of a first reference vector signal to obtain a first closeness score, and is compared to the parameter value of a second reference vector to obtain a second closeness score for each training feature vector. For each training feature vector in the first set, the first closeness score is compared with the second closeness score to obtain a reference match score. A first subset contains those training feature vectors in the first set having reference match scores better than a threshold Q, and a second subset contains those having reference match scores less than the threshold Q. One or more partition values are generated for a first prototype vector frown the first subset of training feature vectors, and one or more additional partition values are generated for the first prototype vector from the second subset of training feature vectors.
摘要翻译：一种语音编码装置，其中测量的声学特征矢量各自由最佳匹配的原型矢量表示。通过存储包括一系列基本模型的训练脚本的模型来生成原型向量。在一系列连续时间间隔中的每一个上测量训练脚本的训练话语的至少一个特征的值，以产生一系列训练特征向量。识别与训练脚本中的第一个基本模型对应的第一组训练特征向量。将第一组中的每个训练特征向量信号的特征值与第一参考矢量信号的参数值进行比较以获得第一接近度分数，并将其与第二参考矢量的参数值进行比较以获得第二接近度分数对于每个训练特征向量。对于第一组中的每个训练特征向量，将第一接近度得分与第二接近度得分进行比较以获得参考匹配得分。第一子集包含具有比阈值Q更好的参考匹配分数的第一集合中的那些训练特征向量，并且第二子集包含具有小于阈值Q的参考匹配分数的训练特征向量。对于第一原型矢量生成一个或多个分区值使训练特征向量的第一子集皱眉，并且从训练特征向量的第二子集为第一原型向量生成一个或多个附加分区值。

8. 发明授权

US4817156A Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker 失效
标题翻译：给予演讲者训练数据的后续发言者快速训练语音识别器
公开(公告)号：US4817156A
公开(公告)日：1989-03-28
申请号：US84712
申请日：1987-08-10
申请人： Lalit R. Bahl , Robert L. Mercer , David Nahamoo
发明人： Lalit R. Bahl , Robert L. Mercer , David Nahamoo
IPC分类号： G10L11/00 , G10L15/06 , G10L15/14 , G01L5/00
CPC分类号： G10L15/14
摘要： Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an lth label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.

9. 发明授权

US4718094A Speech recognition system 失效
标题翻译：语音识别系统
公开(公告)号：US4718094A
公开(公告)日：1988-01-05
申请号：US845155
申请日：1986-03-27
申请人： Lalit R. Bahl , Peter V. deSouza , Steven V. DeGennaro , Robert L. Mercer
发明人： Lalit R. Bahl , Peter V. deSouza , Steven V. DeGennaro , Robert L. Mercer
IPC分类号： G10L11/00 , G10L15/08 , G10L15/10 , G10L15/12 , G10L15/14 , G10L15/18 , G10L5/00
CPC分类号： G10L15/142 , G10L15/08 , G10L15/144 , G10L15/187
摘要： Speech words are recognized by first recognizing each spectral vector identified by a label (feneme), then identifying the word by matching the string of labels against phones using simplified phone machines based on label and transition probabilities and Merkov chains. In one embodiment, a detailed acoustic match word score is combined with an approximate acoustic match word score to provide a total word score for a subject word. In another embodiment, a polling word score is combined with an acoustic match word score to provide a total word score for a subject word. The acoustic models employed in the acoustic matching may correspond, alternatively, to phonetic elements or to fenemes. Fenemes represent labels generated by an acoustic processor in response to a spoken input. Apparatus and method for determining word scores according to approximate acoustic matching and for determining word scores according to a polling methodology are disclosed.
摘要翻译：通过首先识别由标签（feneme）标识的每个频谱矢量，然后通过基于标签和转换概率以及Merkov链使用简化的电话机将标签串与电话匹配来识别词语来识别语音词。在一个实施例中，将详细的声匹配词得分与近似声匹配词得分组合以提供主题词的总词分数。在另一个实施例中，轮询词得分与声匹配词得分组合以提供主题词的总词分数。在声学匹配中使用的声学模型可以对应于语音元件或拼音。 Fenemes表示响应于语音输入由声学处理器产生的标签。公开了根据近似声匹配确定单词分数并根据轮询方法确定单词分数的装置和方法。

10. 发明授权

US5276766A Fast algorithm for deriving acoustic prototypes for automatic speech recognition 失效
标题翻译：用于自动语音识别的声学原型的快速算法
公开(公告)号：US5276766A
公开(公告)日：1994-01-04
申请号：US730714
申请日：1991-07-16
申请人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. DeSouza , David Nahamoo , Michael A. Picheny
发明人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. DeSouza , David Nahamoo , Michael A. Picheny
IPC分类号： G10L19/00 , G10L15/02 , G10L15/06 , G10L9/04
CPC分类号： G10L15/063
摘要： An apparatus for generating a set of acoustic prototype signals for encoding speech includes a memory for storing a training script model comprising a series of word-segment models. Each word-segment model comprises a series of elementary models. An acoustic measure is provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. An acoustic matcher is provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises a cluster processor for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes a memory for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
摘要翻译：一种用于生成用于编码语音的声原型信号的集合的装置包括用于存储包括一系列字段模型的训练脚本模型的存储器。每个单词段模型包括一系列基本模型。提供了一种声学测量，用于在一系列时间间隔的每一个期间测量训练脚本的发音的至少一个特征的值，以产生表示发音的特征值的一系列特征向量信号。提供声学匹配器用于估计通过训练脚本模型的至少一个路径，其将产生整个测量的特征向量信号的一系列。从估计的路径，估计将产生每个特征向量信号的训练脚本模型中的基本模型。该装置还包括用于将特征向量信号聚类成多个聚类的聚类处理器。群集中的每个特征向量信号对应于单个单词段模型中单个位置中的单个基本模型。每个聚类信号具有等于信号中所有特征向量的特征值的平均值的聚类值。最后，该装置包括用于存储多个原型矢量信号的存储器。每个原型矢量信号对应于基本模型，具有标识符，并且包括至少两个分区值。分区值等于对应于基本模型的一个或多个聚类信号的聚类值的组合。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式