专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US5233681A Context-dependent speech recognizer using estimated next word context 失效
标题翻译：使用估计下一个单词上下文的上下文相关语音识别器
公开(公告)号：US5233681A
公开(公告)日：1993-08-03
申请号：US874271
申请日：1992-04-24
申请人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
发明人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
IPC分类号： G10L15/10 , G10L15/18 , G10L15/28
CPC分类号： G10L15/19 , G10L15/193
摘要： A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.

2. 发明授权

US5222146A Speech recognition apparatus having a speech coder outputting acoustic prototype ranks 失效
标题翻译：具有语音编码器的语音识别装置输出声学原型排序
公开(公告)号：US5222146A
公开(公告)日：1993-06-22
申请号：US781440
申请日：1991-10-23
申请人： Latit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
发明人： Latit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
IPC分类号： G10L15/02 , G10L15/14 , G10L19/00
CPC分类号： G10L15/02 , G10L19/0018
摘要： A speech coding and speech recognition apparatus. The value of at least one feature of an utterance is measured over each of a series of successive time intervals to produce a series of feature vector signals. The closeness of the feature value of each feature vector signal to the parameter value of each of a set of prototype vector signals is determined to obtain prototype match scores for each vector signal and each prototype vector signal. For each feature vector signal, first-rank and second-rank scores are associated with the prototype vector signals having the best and second best prototype match scores, respectively. For each feature vector signal, at least the identification value and the rank score of the first-ranked and second-ranked prototype vector signals are output as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals. For each of a plurality of speech units, a probabilistic model has a plurality of model outputs, and output probabilities for each model output. Each model output comprises the identification value of a prototype vector and a rank score. For each speech unit, a match score comprises an estimate of the probability that the probabilistic model of the speech unit would output a series of model outputs matching a reference series comprising the identification value and rank score of at least one prototype vector from each coded utterance representation signal in the series of coded utterance representation signals.

3. 发明授权

US5129001A Method and apparatus for modeling words with multi-arc markov models 失效
标题翻译：用多模式MARKOV模型建模语言的方法和装置
公开(公告)号：US5129001A
公开(公告)日：1992-07-07
申请号：US514075
申请日：1990-04-25
申请人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny
发明人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny
IPC分类号： G06F7/00 , G06F17/18 , G10L15/06 , G10L15/10 , G10L15/14
CPC分类号： G10L15/144
摘要： Modeling a word is done by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. To tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. Constructing word models from composite elemental models, and constructing composite elemental models from primitive elemental models enables word models to represent many variations in the pronunciation of a word. Providing a relatively small set of primitive elemental models for a relatively large vocabulary of words enables models to be trained to the voice of a new speaker by having the new speaker utter only a small subset of the words in the vocabulary.

4. 发明授权

US5195167A Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition 失效
标题翻译：基于自动语音识别的声音相似性将音素的语音分组成上下文相关类别的装置和方法
公开(公告)号：US5195167A
公开(公告)日：1993-03-16
申请号：US871600
申请日：1992-04-17
申请人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny
发明人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny
IPC分类号： G06F7/38 , G06F17/27 , G10L11/00 , G10L15/02 , G10L15/06 , G10L15/10 , G10L15/18
CPC分类号： G10L15/063
摘要： Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.
摘要翻译：测量训练集中的每个事件的符号特征值和上下文特征值。选择观察事件的至少两对互补子集。在观察事件的每对互补子集中，一个子集具有集合C n中的值的上下文特征，另一个集合具有集合Cn中的值的上下文特征，Cn和Cn中的集合是上下文特征值的互补集合。对于观察事件的每个子集，计算子集中观察事件的符号特征的相似度值。对于每对观察事件的互补集合，“拟合优度”是子集的符号特征值相似度的总和。识别与具有最佳“拟合优度”的观察事件的子集相关联的上下文特征值集合，并形成用于将观察到的事件分组为两个输出集合的上下文相关基础。

5. 发明授权

US06493667B1 Enhanced likelihood computation using regression in a speech recognition system 失效
标题翻译：在语音识别系统中使用回归来增强似然计算
公开(公告)号：US06493667B1
公开(公告)日：2002-12-10
申请号：US09368669
申请日：1999-08-05
申请人： Peter V. de Souza , Yuqing Gao , Michael Picheny , Bhuvana Ramabhadran
发明人： Peter V. de Souza , Yuqing Gao , Michael Picheny , Bhuvana Ramabhadran
IPC分类号： G10L1514
CPC分类号： G10L15/144 , G10L2015/085
摘要： In order to achieve low error rates in a speech recognition system, for example, in a system employing rank-based decoding, we discriminate the most confusable incorrect leaves from the correct leaf by lowering their ranks. That is, we increase the likelihood of the correct leaf of a frame, while decreasing the likelihoods of the confusable leaves. In order to do this, we use the auxiliary information from the prediction of the neighboring frames to augment the likelihood computation of the current frame. We then use the residual errors in the predictions of neighboring frames to discriminate between the correct (best) and incorrect leaves of a given frame. We present a new methodology that incorporates prediction error likelihoods into the overall likelihood computation to improve the rank position of the correct leaf.
摘要翻译：为了在语音识别系统中实现低错误率，例如，在采用基于秩解码的系统中，我们通过降低他们的等级来区分来自正确叶片的最混淆的不正确的叶子。也就是说，我们增加了一帧正确叶片的可能性，同时降低了可疑叶片的可能性。为了做到这一点，我们使用来自相邻帧的预测的辅助信息来增加当前帧的似然性计算。然后，我们使用相邻帧的预测中的残差来区分给定帧的正确（最佳）和不正确的叶。我们提出一种将预测误差可能性纳入总体似然计算的新方法，以提高正确叶子的排名。

6. 发明授权

US5884261A Method and apparatus for tone-sensitive acoustic modeling 失效
标题翻译：用于音调声学建模的方法和装置
公开(公告)号：US5884261A
公开(公告)日：1999-03-16
申请号：US271639
申请日：1994-07-07
申请人： Peter V. de Souza , Adam B. Fineberg , Hsiao-Wuen Hon , Baosheng Yuan
发明人： Peter V. de Souza , Adam B. Fineberg , Hsiao-Wuen Hon , Baosheng Yuan
IPC分类号： G10L11/04 , G10L15/02 , G10L15/14 , G10L15/18 , G10L9/00
CPC分类号： G10L15/144 , G10L25/15 , G10L25/90
摘要： Tone-sensitive acoustic models are generated by first generating acoustic vectors which represent the input data. The input data is separated into multiple frames and an acoustic vector is generated for each frame which represents the input data over its corresponding frame. A tone-sensitive parameter is then generated for each of the frames which indicates the tone of the input data at its corresponding frame. Tone-sensitive parameters are generated in accordance with two embodiments. First, a pitch detector may be used to calculate a pitch for each of the frames. If a pitch cannot be detected for a particular frame, then a pitch is created for that frame based on the pitch values of surrounding frames. Second, the cross covariance between the autocorrelation coefficients for each frame and its successive frame may be generated and used as the tone-sensitive parameter. Feature vectors are then created for each frame by appending the tone-sensitive parameter for a frame to the acoustic vector for the same frame. Then, using these feature vectors, acoustic models are created which represent the input data.
摘要翻译：通过首先产生表示输入数据的声矢量来产生音调敏感的声学模型。输入数据被分成多个帧，并且为代表其对应帧上的输入数据的每个帧生成声向量。然后，对于指示在其对应帧处的输入数据的音调的每个帧，生成对音调敏感的参数。根据两个实施例产生音敏参数。首先，可以使用音调检测器来计算每个帧的音调。如果对于特定帧不能检测到音调，则基于周围帧的音调值创建针对该帧的音高。其次，可以生成每个帧及其连续帧的自相关系数之间的交叉协方差，并将其用作音调敏感参数。然后通过将帧的音调敏感参数附加到相同帧的声矢量来为每个帧创建特征向量。然后，使用这些特征向量，创建表示输入数据的声学模型。

7. 发明授权

US5615299A Speech recognition using dynamic features 失效
标题翻译：使用动态特征的语音识别
公开(公告)号：US5615299A
公开(公告)日：1997-03-25
申请号：US262093
申请日：1994-06-20
申请人： Lahit R. Bahl , Peter V. de Souza , Ponani Gopalakrishnan , Michael A. Picheny
发明人： Lahit R. Bahl , Peter V. de Souza , Ponani Gopalakrishnan , Michael A. Picheny
IPC分类号： G10L11/00 , G10L15/02 , G10L15/04 , G10L15/14 , G10L15/20 , G10L19/00 , G10L21/02 , G10L7/08 , G10L5/06
CPC分类号： G10L19/0018 , G10L15/02
摘要： A speech recognition technique utilizes a set of N different principal discriminant matrices. Each principal discriminant matrix is associated with a distinct class. The class is an indication of the proximity of a speech segment to neighboring phones. A technique for speech encoding includes arranging speech signal into a series of frames. A feature vector is derived which represents the speech signal for a speech segment or series of speech segments for each frame. A set of N different projected vectors are generated for each frame, by multiplying the principal discriminant matrices by the vector. This speech encoding technique is capable of being used in speech recognition systems by utilizing models, in which each model transition is tagged with one of the N classes. The projected vector is utilized with the corresponding tag to compute the probability that at least one particular speech port is present in said frame.
摘要翻译：语音识别技术利用一组N个不同的主判别矩阵。每个主要判别矩阵与一个不同的类相关联。该类别是指示语音段与邻近电话的接近程度。用于语音编码的技术包括将语音信号布置成一系列帧。导出特征向量，其表示用于每个帧的语音段或语音段的语音信号。通过将主判别矩阵乘以矢量，为每个帧生成一组N个不同的投影向量。该语音编码技术能够通过利用模型在语音识别系统中使用，其中每个模型转换被标记为N类之一。投影向量与相应的标签一起使用，以计算至少一个特定语音端口存在于所述帧中的概率。

8. 发明授权

US5072452A Automatic determination of labels and Markov word models in a speech recognition system 失效
标题翻译：在语音识别系统中自动确定标签和马尔可夫词模型
公开(公告)号：US5072452A
公开(公告)日：1991-12-10
申请号：US431720
申请日：1989-11-02
申请人： Peter F. Brown , Peter V. De Souza , David Nahomoo , Michael A. Picheny
发明人： Peter F. Brown , Peter V. De Souza , David Nahomoo , Michael A. Picheny
IPC分类号： G10L15/14
CPC分类号： G10L15/14
摘要： In a Markov model speech recognition system, an acoustic processor generates one label after another selected from an alphabet of labels. Each vocabulary word is represented as a baseform constructed of a sequence of Markov models. Each Markov model is stored in a computer memory as (a) a plurality of states; (b) a plurality of arcs, each extending from a state to a state with a respective stored probability; and (c) stored label output probabilities, each indicating the likelihood of a given label being produced at a certain arc. Word likelihood based on acoustic characteristics is determined by matching a string of labels generated by the acoustic processor against the probabilities stored for each word baseform. Improved models of words are obtained by specifying label parameters and constructing word baseforms interdependently and iteratively.
摘要翻译：在马尔科夫模型语音识别系统中，声学处理器从标签的字母表生成一个另外的标签。每个词汇表示为由马尔可夫模型序列构成的基础形式。每个马尔可夫模型以（a）多个状态存储在计算机存储器中; （b）多个弧，每个弧从状态到各自存储的概率的状态; 和（c）存储的标签输出概率，每个都表示给定标签在某一弧度产生的可能性。基于声学特性的词似然性通过将由声学处理器生成的一串标签与针对每个单词基础形式存储的概率相匹配来确定。通过指定标签参数和相互依赖和迭代地构建单词基础形式来获得改进的单词模型。

9. 发明授权

US5333236A Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models 失效
标题翻译：语音识别器具有基于上下文相关语音 - 过渡声学模型的用于声学匹配的语音编码器
公开(公告)号：US5333236A
公开(公告)日：1994-07-26
申请号：US942862
申请日：1992-09-10
申请人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
发明人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , Michael A. Picheny
IPC分类号： G10L15/10 , G10L15/14 , G10L15/18 , G10L19/00 , G10L19/04 , G10L19/06 , G10L19/08 , G10L9/00
CPC分类号： G10L19/06
摘要： A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.
摘要翻译：语音编码装置将发声特征矢量信号的特征值与原型矢量信号的参数值的接近度进行比较，以获得特征向量信号和每个原型矢量信号的原型匹配分数。语音编码装置存储表示语音转换的多个语音转换模型。至少一个语音转换由多个不同的模型表示。每个语音转换模型具有多个模型输出，每个模型输出包括原型矢量信号的原型匹配分数。每个模型输出具有输出概率。用于第一特征向量信号和每个语音转换模型的模型匹配分数包括用于第一特征向量信号和原型矢量信号的至少一个原型匹配分数的输出概率。用于第一特征向量信号和每个语音转换的语音转换匹配分数包括用于第一特征向量信号的最佳模型匹配分数和表示语音转换的所有语音转换模型。输出第一特征矢量信号和每个语音转换的每个语音转换的识别值和语音转换匹配分数作为第一特征向量信号的编码话音表示信号。

10. 发明授权

US5278942A Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
标题翻译：具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置
公开(公告)号：US5278942A
公开(公告)日：1994-01-11
申请号：US802678
申请日：1991-12-05
申请人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , Arthur J. Nadas , David Nahamoo , Michael A. Picheny
发明人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , Arthur J. Nadas , David Nahamoo , Michael A. Picheny
IPC分类号： G10L19/00 , G10L15/02 , G10L15/06 , G10L15/10 , G10L9/02
CPC分类号： G10L15/063 , G10L15/02
摘要： A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.
摘要翻译：一种用于语音识别装置和方法的语音编码装置和方法。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。存储多个具有至少一个参数值和唯一识别值的原型矢量信号。将特征矢量信号的接近度与原型矢量信号的参数值进行比较，以获得特征值信号和每个原型矢量信号的原型匹配分数。输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。合成的训练矢量信号是变换的参考特征矢量信号，其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式