专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07464031B2 Speech recognition utilizing multitude of speech features 失效
标题翻译：语音识别利用多种语音特征
公开(公告)号：US07464031B2
公开(公告)日：2008-12-09
申请号：US10724536
申请日：2003-11-28
申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Ramesh A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig
发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Ramesh A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig
IPC分类号： G10L15/00 , G10L15/20
CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085
摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要翻译：在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

2. 发明申请

US20080312921A1 SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES 审中-公开
标题翻译：语音识别利用多种语音特征
公开(公告)号：US20080312921A1
公开(公告)日：2008-12-18
申请号：US12195123
申请日：2008-08-20
申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig
发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig
IPC分类号： G10L15/00 , G10L15/04
CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085
摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要翻译：在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

3. 发明授权

US5680509A Method and apparatus for estimating phone class probabilities a-posteriori using a decision tree 失效
标题翻译：用于使用决策树估计电话类概率的方法和装置
公开(公告)号：US5680509A
公开(公告)日：1997-10-21
申请号：US312584
申请日：1994-09-27
申请人： Ponani S. Gopalakrishnan , David Nahamoo , Mukund Padmanabhan , Michael Alan Picheny
发明人： Ponani S. Gopalakrishnan , David Nahamoo , Mukund Padmanabhan , Michael Alan Picheny
IPC分类号： G10L15/06 , G10L15/08 , G10L5/06
CPC分类号： G10L15/063 , G10L15/08
摘要： A method and apparatus for estimating the probability of phones, a-posteriori, in the context of not only the acoustic feature at that time, but also the acoustic features in the vicinity of the current time, and its use in cutting down the search-space in a speech recognition system. The method constructs and uses a decision tree, with the predictors of the decision tree being the vector-quantized acoustic feature vectors at the current time, and in the vicinity of the current time. The process starts with an enumeration of all (predictor, class) events in the training data at the root node, and successively partitions the data at a node according to the most informative split at that node. An iterative algorithm is used to design the binary partitioning. After the construction of the tree is completed, the probability distribution of the predicted class is stored at all of its terminal leaves. The decision tree is used during the decoding process by tracing a path down to one of its leaves, based on the answers to binary questions about the vector-quantized acoustic feature vector at the current time and its vicinity.
摘要翻译：在不仅在当时的声学特征以及当前时间附近的声学特征的上下文中估计电话的概率的方法和装置，以及其用于减少搜索 - 语音识别系统中的空间。该方法构造并使用决策树，其中决策树的预测变量是当前时间和当前时间附近的矢量量化的声学特征向量。该过程从在根节点的训练数据中的所有（预测器，类）事件的枚举开始，并且根据该节点处的最多信息拆分在节点处依次划分数据。迭代算法用于设计二进制分区。树完成后，预测类的概率分布存储在其所有终端叶上。基于对当前时间及其附近的向量量化声学特征向量的二进制问题的答案，在解码过程中使用决策树通过跟踪到其叶子之一的路径。

4. 发明授权

US6023673A Hierarchical labeler in a speech recognition system 失效
标题翻译：语音识别系统中的分层标签器
公开(公告)号：US6023673A
公开(公告)日：2000-02-08
申请号：US869061
申请日：1997-06-04
申请人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy
发明人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy
IPC分类号： G10L5/06 , G10L9/00
CPC分类号： G10L15/083
摘要： A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.
摘要翻译：语音编码装置和方法使用原型集的层次来编码话语，同时消耗更少的计算资源。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。计算原型矢量信号的多个级别子集，其中较高级子集中的每个原型矢量信号与较低级子集中的至少一个原型矢量信号相关联。每个级别子集包含多个原型矢量信号，其中较低级子集包含比较高级子集更多的原型。将第一特征向量信号的特征值的接近度与原型矢量信号的第一级子集中的原型矢量信号的参数值进行比较，以获得第一特征向量信号和每个原型矢量的原型匹配分数的排序列表信号在第一级子集。将第一特征向量信号的特征值的接近度与与第一级子集中的最高排序原型向量相关联的第二（较低）级子集中的每个原型矢量信号的参数值进行比较，以获得第二排名榜的原型比赛得分。将具有最佳原型匹配分数的第二等级列表中的原型矢量信号的识别值输出为第一特征向量信号的编码话音表示信号。

5. 发明授权

US5649060A Automatic indexing and aligning of audio and text using speech recognition 失效
标题翻译：使用语音识别自动索引和对齐音频和文本
公开(公告)号：US5649060A
公开(公告)日：1997-07-15
申请号：US547113
申请日：1995-10-23
申请人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny
发明人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny
IPC分类号： G03B31/00 , G06F17/30 , G10L15/00 , G10L15/18 , G10L15/22 , G10L15/26 , G11B27/028 , G11B27/10 , G11B27/28 , H04N5/91 , G10L9/00
CPC分类号： G11B27/28 , G06F17/30746 , G11B27/028 , G11B27/10
摘要： A method of automatically aligning a written transcript with speech in video and audio clips. The disclosed technique involves as a basic component an automatic speech recognizer. The automatic speech recognizer decodes speech (recorded on a tape) and produces a file with a decoded text. This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The results of this matching is an alignment of the speech with the original transcript. The method can be used (a) to create indexing of video clips, (b) for "teleprompting" (i.e. showing the next portion of text when someone is reading from a television screen), or (c) to enhance editing of a text that was dictated to a stenographer or recorded on a tape for its subsequent textual reproduction by a typist.
摘要翻译：自动将书面誊本与视频和音频剪辑中的语音对齐的方法。所公开的技术涉及作为自动语音识别器的基本组件。自动语音识别器解码语音（记录在磁带上）并产生具有解码文本的文件。然后，通过识别类似的单词或单词集合，将该解码的文本与原始的书面记录相匹配。这种匹配的结果是语音与原始誊本的一致。该方法可用于（a）创建视频剪辑的索引，（b）“电视提示”（即，当有人从电视屏幕读取时显示文本的下一部分），或（c）增强文本的编辑这是由速记员决定的，或者录制在磁带上，以便打字员随后进行文字复制。

6. 发明授权

US08145491B2 Techniques for enhancing the performance of concatenative speech synthesis 有权
标题翻译：提高连接语音合成性能的技术
公开(公告)号：US08145491B2
公开(公告)日：2012-03-27
申请号：US10208453
申请日：2002-07-30
申请人： Wael Mohamed Hamza , Michael Alan Picheny
发明人： Wael Mohamed Hamza , Michael Alan Picheny
IPC分类号： G10L13/06
CPC分类号： G10L13/07
摘要： When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.
摘要翻译：当语音片段的节距从当前音调修改为所请求的节距，并且它们之间的差异相对较大时，使用音调修改算法来修改语音片段的音高。当电流和请求间距之差相对较小时，语音段的音调不被修改。在使用一种或另一种语音修改技术之后，将所得到的修改语音段重叠并添加到先前修改的语音段。确定修正率以量化语音段的当前和所请求的间距之间的差异。修正比是要求和当前间距之间的比率。使用低和高比率阈值来确定音调何时被修改到预定的高度，以及语音片段的音调是否将被修改。

7. 发明授权

US06073096A Speaker adaptation system and method based on class-specific pre-clustering training speakers 失效
标题翻译：基于类特定的前聚类训练讲话者的演讲人适应系统和方法
公开(公告)号：US06073096A
公开(公告)日：2000-06-06
申请号：US18350
申请日：1998-02-04
申请人： Yuqing Gao , Mukund Padmanabhan , Michael Alan Picheny
发明人： Yuqing Gao , Mukund Padmanabhan , Michael Alan Picheny
IPC分类号： G10L15/07 , G10L15/06
CPC分类号： G10L15/07
摘要： A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.
摘要翻译：根据本发明的语音识别方法包括以下步骤：基于声学特征对声学进行分组以形成类别，由类别聚类训练讲话者以提供特定类别的集群系统，从集群系统中选择集群的子集最接近来自测试说话者的自适应数据的系统，基于适配数据来改变集群系统的子集以使集群系统的子集更靠近测试说话者，以形成适应的集群系统，并组合适应的集群系统以创建一个说话者适配系统用于解码来自测试扬声器的语音。包括构建语音识别系统的系统和方法以及适用于类特定扬声器群的扬声器系统。

8. 发明授权

US5806021A Automatic segmentation of continuous text using statistical approaches 失效
标题翻译：使用统计方法自动分割连续文本
公开(公告)号：US5806021A
公开(公告)日：1998-09-08
申请号：US700823
申请日：1996-09-04
申请人： Chengjun Julian Chen , Fu-Hua Liu , Michael Alan Picheny
发明人： Chengjun Julian Chen , Fu-Hua Liu , Michael Alan Picheny
IPC分类号： G06F17/27 , G06F17/20
CPC分类号： G06F17/277
摘要： An automatic segmenter for continuous text segments such text in a rapid, consistent and semantically accurate manner. Two statistical methods for segmentation of continuous text are used. The first method, called "forward-backward matching", is easy and fast but can produce occasional errors in long phrases. The second method, called "statistical stack search segmenter", utilizes statistical language models to generate more accurate segmentation output at an expense of two times more execution time than the "forward-backward matching" method. In some applications where speed is a major concern, "forward-backward matching" can be used, while in other applications where highly accurate output is desired, "statistical stack search segmenter" is ideal.
摘要翻译：用于以快速，一致和语义准确的方式连续文本段的自动分段器。使用两种连续文本分割的统计方法。第一种称为“前向 - 后向匹配”的方法是简单快捷的，但可能会产生长时间的误差。称为“统计堆栈搜索分段器”的第二种方法利用统计语言模型以比“前向 - 后向匹配”方法多两倍的执行时间来生成更精确的分段输出。在速度是主要关注的一些应用中，可以使用“前向后匹配”，而在需要高精度输出的其他应用中，“统计栈搜索分段器”是理想的。

9. 发明授权

US07610191B2 Method for fast semi-automatic semantic annotation 有权
标题翻译：快速半自动语义注释方法
公开(公告)号：US07610191B2
公开(公告)日：2009-10-27
申请号：US10959523
申请日：2004-10-06
申请人： Yuqing Gao , Michael Alan Picheny , Ruhi Sarikaya
发明人： Yuqing Gao , Michael Alan Picheny , Ruhi Sarikaya
IPC分类号： G06F17/27
CPC分类号： G06F17/271 , G06F17/2755
摘要： A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.
摘要翻译：提供了一种用于快速半自动语义注释的方法，装置和计算机指令。给定有限的注释语料库，本发明使用解析器引擎，相似性引擎和SVM引擎为下一个有限注释语料库的每个单词分配标签和标签。然后，流动站组合来自三个引擎的解析树，并自信地注释下一批有限注释语料库，从而减少人体注释所需的努力。

10. 发明授权

US5751905A Statistical acoustic processing method and apparatus for speech recognition using a toned phoneme system 失效
标题翻译：使用音调音素系统进行语音识别的统计声学处理方法和装置
公开(公告)号：US5751905A
公开(公告)日：1998-05-12
申请号：US404786
申请日：1995-03-15
申请人： Chengjun Julian Chen , Ramesh Ambat Gopinath , Michael Daniel Monkowski , Michael Alan Picheny
发明人： Chengjun Julian Chen , Ramesh Ambat Gopinath , Michael Daniel Monkowski , Michael Alan Picheny
IPC分类号： G10L15/10 , G10L11/04 , G10L15/14 , G10L15/18 , G10L5/06
CPC分类号： G10L25/90 , G10L15/142 , G10L25/06 , G10L25/15
摘要： A method and apparatus for acoustic signal processing of speech recognition, the method comprising the following components: 1) Decompose each syllable into two phonemes of comparable length and complexity, the first one being a preme, and the second one being a toneme; 2) Each toneme is assigned a tone value such as high, rising, low, falling, and untoned; 3) No tone value is assigned to premes; 4) Pitch is detected continuously and treated the same way as energy and cepstrals in a Hidden Markov Model to predict the tone of a toneme; 5) The tone of a syllable is defined as the tone of its component toneme.
摘要翻译：一种用于语音识别的声信号处理的方法和装置，所述方法包括以下部分：1）将每个音节分解成两个具有相当长度和复杂度的音素，第一个是preme，第二个音素是toneme; 2）每个toneme被分配一个音调值，如高，上升，低，下降和解除; 3）没有音调值被分配给premes; 4）在隐马尔科夫模型中，连续检测音调和能量和倒谱相同的方式来预测音调的音调; 5）音节的音调被定义为其音调的音调。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式