专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US5710866A System and method for speech recognition using dynamically adjusted confidence measure 失效
标题翻译：使用动态调整的置信度测量语音识别的系统和方法
公开(公告)号：US5710866A
公开(公告)日：1998-01-20
申请号：US452141
申请日：1995-05-26
申请人： Fileno A. Alleva , Douglas H. Beeferman , Xuedong Huang
发明人： Fileno A. Alleva , Douglas H. Beeferman , Xuedong Huang
IPC分类号： G10L15/10 , G10L15/14 , G10L9/00
CPC分类号： G10L15/10 , G10L15/142
摘要： A computer-implemented method of recognizing an input speech utterance compares the input speech utterance to a plurality of hidden Markov models to obtain a constrained acoustic score that reflects the probability that the hidden Markov model matches the input speech utterance. The method computes a confidence measure for each hidden Markov model that reflects the probability of the constrained acoustic score being correct. The computed confidence measure is then used to adjust the constrained acoustic score. Preferably, the confidence measure is computed based on a difference between the constrained acoustic score and an unconstrained acoustic score that is computed independently of any language context. In addition, a new confidence measure preferably is computed for each input speech frame from the input speech utterance so that the constrained acoustic score is adjusted for each input speech frame.
摘要翻译：识别输入语音发音的计算机实现的方法将输入的语音话语与多个隐马尔可夫模型进行比较，以获得反映隐马尔可夫模型与输入语音话语匹配的概率的约束声学得分。该方法计算每个隐马尔可夫模型的置信度度量，该模型反映受限声学分数正确的概率。然后使用计算的置信度来调整约束的声学得分。优选地，基于约束声学得分和独立于任何语言上下文计算的无约束声学评分之间的差来计算置信度量。此外，对于每个输入语音帧，优选地根据输入语音话语计算新的置信度量度，以便针对每个输入语音帧调整约束声学得分。

2. 发明授权

US5794197A Senone tree representation and evaluation 失效
标题翻译： Senone树代表和评估
公开(公告)号：US5794197A
公开(公告)日：1998-08-11
申请号：US850061
申请日：1997-05-02
申请人： Fileno A. Alleva , Xuedong Huang , Mei-Yuh Hwang
发明人： Fileno A. Alleva , Xuedong Huang , Mei-Yuh Hwang
IPC分类号： G10L15/02 , G10L15/06 , G10L15/14 , G10L15/18 , G10L5/06
CPC分类号： G10L15/146 , G10L15/187 , G10L2015/0631
摘要： A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone. As a result, unseen triphones not encountered in the training data can be modeled with senones created using the triphones actually found in the training data.
摘要翻译：语音识别方法使用隐马尔可夫模型提供了识别精度的改进建模。在训练期间，该方法为训练词数据集中遇到的每个音素的每个状态创建一个声调树。在训练词集合中为选定音素的选定状态接收的所有输出分布被聚集在声调树的根节点中。从根节点开始的树的每个节点被分成两个节点，通过询问关于三音节的中心音素的左侧和右侧的音素的语言问题。在预定的点，树的创建停止，导致代表聚集的输出分布的叶被称为senones。声音树允许所有可能的三通电话通过遍历与三通电话的中心音素相关联的音素树来映射成一系列的单音。因此，训练数据中未见到的看不见的三重奏可以使用在训练数据中实际发现的三通奏音而创建的声音进行建模。

3. 发明授权

US07016838B2 Method and system for frame alignment and unsupervised adaptation of acoustic models 失效
公开(公告)号：US07016838B2
公开(公告)日：2006-03-21
申请号：US10987529
申请日：2004-11-12
申请人： William H. Rockenbeck , Milind V. Mahajan , Fileno A. Alleva
发明人： William H. Rockenbeck , Milind V. Mahajan , Fileno A. Alleva
IPC分类号： G10L15/06
CPC分类号： G10L15/065
摘要： An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

4. 发明授权

US06263308B1 Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process 有权
标题翻译：使用通过交互过程改善的声学模型进行语音识别的方法和装置
公开(公告)号：US06263308B1
公开(公告)日：2001-07-17
申请号：US09531055
申请日：2000-03-20
申请人： David E. Heckerman , Fileno A. Alleva , Robert L. Rounthwaite , Daniel Rosen , Mei-Yuh Hwang , Yoram Yaacovi , John L. Manferdelli
发明人： David E. Heckerman , Fileno A. Alleva , Robert L. Rounthwaite , Daniel Rosen , Mei-Yuh Hwang , Yoram Yaacovi , John L. Manferdelli
IPC分类号： G10L1502
CPC分类号： G10L15/063
摘要： Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data. The audio and text data are synchronized using the results of the updated acoustic model. In addition, one or more error reports based on the final recognition results are generated showing discrepancies between the recognized words and the words included in the text. By retraining the acoustic model in the above described manner, improved accuracy is achieved.
摘要翻译：描述用于同步音频和文本数据的自动方法和装置，例如以电子文件的形式，表示相同作品或信息的音频和文本表达。还描述了检测相同作品的音频和文本版本之间的错误和其他差异的自动化方法。首先使用与扬声器无关的声学模型对音频数据执行语音识别操作。通过语音识别操作产生除音频时间戳之外的识别文本。将识别的文本与文本数据中的文本进行比较，以识别正确识别的字词。然后使用来自音频数据的正确识别的文本和对应的音频段将声学模型再训练，将初始声学模型变换成扬声器训练的声学模型。然后再训练的声学模型用于对音频数据执行附加的语音识别操作。使用更新的声学模型的结果来同步音频和文本数据。此外，生成基于最终识别结果的一个或多个错误报告，显示识别的单词与文本中包含的单词之间的差异。通过以上述方式重新训练声学模型，实现了提高的精度。

5. 发明申请

US20100312565A1 INTERACTIVE TTS OPTIMIZATION TOOL 有权
标题翻译：交互式TTS优化工具
公开(公告)号：US20100312565A1
公开(公告)日：2010-12-09
申请号：US12481510
申请日：2009-06-09
申请人： Jian-Chao Wang , Lu-Jun Yuan , Sheng Zhao , Fileno A. Alleva , Jingyang Xu , Chiwei Che
发明人： Jian-Chao Wang , Lu-Jun Yuan , Sheng Zhao , Fileno A. Alleva , Jingyang Xu , Chiwei Che
IPC分类号： G10L13/04
CPC分类号： G10L13/033 , G10L13/00 , G10L13/04
摘要： An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.
摘要翻译：提供了具有用户友好的图形用户界面的交互式提示生成和TTS优化工具。该工具接受来自用户的HTS抽象或语音识别处理的输入，以产生用于合成的增强的初始波形。波形的声学特征通过图形可视化呈现给用户，使得用户能够修改语音合成过程的各种参数并收听修改的版本，直到达到可接受的最终产品。

6. 发明授权

US06934683B2 Disambiguation language model 失效
标题翻译：消歧语言模型
公开(公告)号：US06934683B2
公开(公告)日：2005-08-23
申请号：US09773242
申请日：2001-01-31
申请人： Yun-cheng Ju , Fileno A. Alleva
发明人： Yun-cheng Ju , Fileno A. Alleva
IPC分类号： G10L15/06 , G10L15/18 , G10L15/14
CPC分类号： G10L15/18 , G10L15/063
摘要： A language model for a language processing system such as a speech recognition system is formed as a function of associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.
摘要翻译：形成语言识别系统等语言处理系统的语言模型，作为相关字符，单词短语和上下文提示的函数。公开了一种用于生成用于训练语言模型的训练语料库和使用这种语言模型的系统或模块的方法和装置。

7. 发明授权

US6076056A Speech recognition system for recognizing continuous and isolated speech 失效
标题翻译：用于识别连续和孤立语音的语音识别系统
公开(公告)号：US6076056A
公开(公告)日：2000-06-13
申请号：US934622
申请日：1997-09-19
申请人： Xuedong D. Huang , Fileno A. Alleva , Li Jiang , Mei-Yuh Hwang
发明人： Xuedong D. Huang , Fileno A. Alleva , Li Jiang , Mei-Yuh Hwang
IPC分类号： G10L15/02 , G10L15/04 , G10L15/06 , G10L15/08 , G10L15/14 , G10L15/28
CPC分类号： G10L15/08 , G10L15/05
摘要： Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.
摘要翻译：通过接收指示多个离散讲话的训练词的孤立语音训练数据，以及接收指示多个连续讲话的训练词的连续语音训练数据来执行语音识别。基于孤立语音训练数据和连续语音训练数据来训练多个语音单元模型。基于训练的语音单元模型识别语音。

8. 发明授权

US06973427B2 Method for adding phonetic descriptions to a speech recognition lexicon 失效
标题翻译：将语音描述添加到语音识别词典中的方法
公开(公告)号：US06973427B2
公开(公告)日：2005-12-06
申请号：US09748453
申请日：2000-12-26
申请人： Mei-Yuh Hwang , Fileno A. Alleva , Rebecca C. Weiss
发明人： Mei-Yuh Hwang , Fileno A. Alleva , Rebecca C. Weiss
IPC分类号： G10L15/06 , G06F17/21 , G01L15/04 , G01L15/06 , G01L15/08
CPC分类号： G10L15/063 , G10L2015/0636
摘要： A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.
摘要翻译：一种方法和计算机可读介质将单词的文本和用户的该单词的发音转换成要添加到语音识别词典的语音描述中。最初，会生成两个可能的语音描述。一个语音描述从单词的文字形成。另一个语音描述是通过对表示用户对该单词的发音的语音信号进行解码形成的。基于与用户发音的对应关系，语音描述都得分。然后选择具有最高分数的语音描述，用于语音识别词典中的输入。

9. 发明授权

US06856956B2 Method and apparatus for generating and displaying N-best alternatives in a speech recognition system 有权
标题翻译：用于在语音识别系统中生成和显示N最佳选择的方法和装置
公开(公告)号：US06856956B2
公开(公告)日：2005-02-15
申请号：US09804117
申请日：2001-03-12
申请人： Chris Thrasher , Fileno A. Alleva
发明人： Chris Thrasher , Fileno A. Alleva
IPC分类号： G06F9/44 , G10L15/08 , G10L15/18 , G10L15/22 , G10L15/26 , G10L15/28 , H04L29/06 , H04L29/08 , G10L15/10
CPC分类号： G10L15/28 , G06F9/4488 , G10L15/083 , G10L15/197 , G10L15/26 , H04L29/06027
摘要： The present invention is directed to a method and apparatus for generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.
摘要翻译：本发明涉及一种用于产生替代指示识别的语音的单词的方法和装置。基于输入的语音数据生成识别字的参考路径。接收到操作者选择输入并且指示所识别的语音的选定部分，为此生成替代方案。基于与识别的语音的所选部分相对应的参考子路径的边界来计算用于生成替代物的边界条件。满足边界条件的备用子路径由对应于输入语音数据的假设存储器构成。

10. 发明授权

US06336108B1 Speech recognition with mixtures of bayesian networks 有权
标题翻译：语音识别与贝叶斯网络的混合
公开(公告)号：US06336108B1
公开(公告)日：2002-01-01
申请号：US09220197
申请日：1998-12-23
申请人： Bo Thiesson , Christopher A. Meek , David Maxwell Chickering , David Earl Heckerman , Fileno A. Alleva , Mei-Yuh Hwang
发明人： Bo Thiesson , Christopher A. Meek , David Maxwell Chickering , David Earl Heckerman , Fileno A. Alleva , Mei-Yuh Hwang
IPC分类号： G06F1518
CPC分类号： G06K9/6296 , G06N5/025 , Y10S707/99945 , Y10S707/99948
摘要： The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state. Each HSBN has nodes corresponding to the elements of the acoustic observations. These nodes store probability parameters corresponding to the probabilities with causal links representing dependencies between ones of said nodes.
摘要翻译：本发明使用贝叶斯网络混合的阵列来执行语音识别。贝叶斯网络（MBN）的混合由多个具有隐藏和观察变量的假设特定贝叶斯网络（HSBN）组成。常见的外部隐藏变量与MBN相关联，但不包括在任何HSBN中。 MBN中的HSBN的数量对应于共同外部隐藏变量的状态数，并且每个HSBN在假设下共同的外部隐藏变量处于相应的一个状态的假设下对世界进行建模。根据本发明，MBN编码了考虑到所述话音部分中的相应一个的话语来观察声学观测组的概率。每个HSBN编码观察给定语音相应的一个语音的发音并给出隐藏的公共变量处于特定状态的声学观察组的概率。每个HSBN具有对应于声学观测元素的节点。这些节点存储对应于概率的概率参数，其中因果链接表示所述节点之间的依赖关系。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式