会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process
    • 使用通过交互过程改善的声学模型进行语音识别的方法和装置
    • US06263308B1
    • 2001-07-17
    • US09531055
    • 2000-03-20
    • David E. HeckermanFileno A. AllevaRobert L. RounthwaiteDaniel RosenMei-Yuh HwangYoram YaacoviJohn L. Manferdelli
    • David E. HeckermanFileno A. AllevaRobert L. RounthwaiteDaniel RosenMei-Yuh HwangYoram YaacoviJohn L. Manferdelli
    • G10L1502
    • G10L15/063
    • Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data. The audio and text data are synchronized using the results of the updated acoustic model. In addition, one or more error reports based on the final recognition results are generated showing discrepancies between the recognized words and the words included in the text. By retraining the acoustic model in the above described manner, improved accuracy is achieved.
    • 描述用于同步音频和文本数据的自动方法和装置,例如以电子文件的形式,表示相同作品或信息的音频和文本表达。 还描述了检测相同作品的音频和文本版本之间的错误和其他差异的自动化方法。 首先使用与扬声器无关的声学模型对音频数据执行语音识别操作。 通过语音识别操作产生除音频时间戳之外的识别文本。 将识别的文本与文本数据中的文本进行比较,以识别正确识别的字词。 然后使用来自音频数据的正确识别的文本和对应的音频段将声学模型再训练,将初始声学模型变换成扬声器训练的声学模型。 然后再训练的声学模型用于对音频数据执行附加的语音识别操作。 使用更新的声学模型的结果来同步音频和文本数据。 此外,生成基于最终识别结果的一个或多个错误报告,显示识别的单词与文本中包含的单词之间的差异。 通过以上述方式重新训练声学模型,实现了提高的精度。
    • 2. 发明授权
    • Methods and apparatus for automatically synchronizing electronic audio files with electronic text files
    • 电子音频文件与电子文本文件自动同步的方法和装置
    • US06260011B1
    • 2001-07-10
    • US09531054
    • 2000-03-20
    • David E. HeckermanFileno A. AllevaRobert L. RounthwaiteDaniel RosenMei-Yuh HwangYoram YaacoviJohn L. Manferdelli
    • David E. HeckermanFileno A. AllevaRobert L. RounthwaiteDaniel RosenMei-Yuh HwangYoram YaacoviJohn L. Manferdelli
    • G10L1508
    • H04N21/466G06F17/30017G09B5/062G10L15/08G10L15/26H04N21/4307H04N21/435H04N21/4394H04N21/8106H04N21/8133
    • Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers. In one embodiment, for a text location to be identified for synchronization purposes, both words which bracket, e.g., precede and follow, the recognized silence must be correctly identified. Pointers, corresponding to identified locations of silence to be used for synchronization purposes are inserted into the text and/or audio files at the identified locations. Audio time stamps obtained from the speech recognition operation may be used as the bi-directional pointers. Synchronized text and audio data may be output in a variety of file formats.
    • 描述用于同步音频和文本数据的自动方法和装置,例如以电子文件的形式,表示相同作品或信息的音频和文本表达。 从文本数据生成统计语言模型。 然后使用生成的语言模型和与扬声器无关的声学模型对音频数据执行语音识别操作。 沉默被模仿为可以被认可的一个词。 语音识别操作产生识别字的时间索引集合,其中一些可能是静音。 识别的单词与文本数据中的单词全局对齐。 识别的静音期间,其对应于预期的沉默期,并且被一个或多个正确识别的字相邻,被识别为文本和音频文件应当被同步的点,例如通过插入双向指针。 在一个实施例中,对于要为同步目的被识别的文本位置,必须正确地识别包括例如先前和后面的两个单词。 对应于要用于同步目的的所确定的沉默位置的指针被插入到所识别位置的文本和/或音频文件中。 从语音识别操作获得的音频时间戳可以用作双向指针。 可以以各种文件格式输出同步的文本和音频数据。
    • 4. 发明授权
    • Speech recognition with mixtures of bayesian networks
    • 语音识别与贝叶斯网络的混合
    • US06336108B1
    • 2002-01-01
    • US09220197
    • 1998-12-23
    • Bo ThiessonChristopher A. MeekDavid Maxwell ChickeringDavid Earl HeckermanFileno A. AllevaMei-Yuh Hwang
    • Bo ThiessonChristopher A. MeekDavid Maxwell ChickeringDavid Earl HeckermanFileno A. AllevaMei-Yuh Hwang
    • G06F1518
    • G06K9/6296G06N5/025Y10S707/99945Y10S707/99948
    • The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state. Each HSBN has nodes corresponding to the elements of the acoustic observations. These nodes store probability parameters corresponding to the probabilities with causal links representing dependencies between ones of said nodes.
    • 本发明使用贝叶斯网络混合的阵列来执行语音识别。 贝叶斯网络(MBN)的混合由多个具有隐藏和观察变量的假设特定贝叶斯网络(HSBN)组成。 常见的外部隐藏变量与MBN相关联,但不包括在任何HSBN中。 MBN中的HSBN的数量对应于共同外部隐藏变量的状态数,并且每个HSBN在假设下共同的外部隐藏变量处于相应的一个状态的假设下对世界进行建模。 根据本发明,MBN编码了考虑到所述话音部分中的相应一个的话语来观察声学观测组的概率。 每个HSBN编码观察给定语音相应的一个语音的发音并给出隐藏的公共变量处于特定状态的声学观察组的概率。 每个HSBN具有对应于声学观测元素的节点。 这些节点存储对应于概率的概率参数,其中因果链接表示所述节点之间的依赖关系。
    • 7. 发明授权
    • Method and apparatus for the recognition of spelled spoken words
    • 用于识别拼写的口语的方法和装置
    • US06694296B1
    • 2004-02-17
    • US09706375
    • 2000-11-03
    • Fileno A. AllevaMei-Yuh HwangYun-Cheng Ju
    • Fileno A. AllevaMei-Yuh HwangYun-Cheng Ju
    • G10L1528
    • G10L15/197G10L2015/086
    • The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.
    • 语音识别器包括提供语言模型输出的听写语言模型,所述听写模型输出指示基于输入语音识别的可能的单词序列。 拼写语言模型提供了拼写模型输出,其指示基于输入话语识别的可能字母序列。 声学模型提供指示基于输入的话语识别的可能语音单元的声学模型输出。 语音识别组件被配置为访问听写语言模型,拼写语言模型和声学模型。 语音识别组件基于输入的话语来计算听写模型输出和拼写模型输出,以计算可能识别的语音。 语音识别器还可以被配置为将拼写的语音限制在活动词典中。
    • 8. 发明授权
    • Senone tree representation and evaluation
    • Senone树代表和评估
    • US5794197A
    • 1998-08-11
    • US850061
    • 1997-05-02
    • Fileno A. AllevaXuedong HuangMei-Yuh Hwang
    • Fileno A. AllevaXuedong HuangMei-Yuh Hwang
    • G10L15/02G10L15/06G10L15/14G10L15/18G10L5/06
    • G10L15/146G10L15/187G10L2015/0631
    • A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone. As a result, unseen triphones not encountered in the training data can be modeled with senones created using the triphones actually found in the training data.
    • 语音识别方法使用隐马尔可夫模型提供了识别精度的改进建模。 在训练期间,该方法为训练词数据集中遇到的每个音素的每个状态创建一个声调树。 在训练词集合中为选定音素的选定状态接收的所有输出分布被聚集在声调树的根节点中。 从根节点开始的树的每个节点被分成两个节点,通过询问关于三音节的中心音素的左侧和右侧的音素的语言问题。 在预定的点,树的创建停止,导致代表聚集的输出分布的叶被称为senones。 声音树允许所有可能的三通电话通过遍历与三通电话的中心音素相关联的音素树来映射成一系列的单音。 因此,训练数据中未见到的看不见的三重奏可以使用在训练数据中实际发现的三通奏音而创建的声音进行建模。
    • 9. 发明授权
    • Modelling and processing filled pauses and noises in speech recognition
    • 在语音识别中建模和处理填充的暂停和噪声
    • US07076422B2
    • 2006-07-11
    • US10388259
    • 2003-03-13
    • Mei-Yuh Hwang
    • Mei-Yuh Hwang
    • G10L15/20
    • G10L15/142G10L2021/02168
    • A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.
    • 语音识别系统识别扬声器产生的填充暂停发声。 在一个实施例中,遍历模型用于声学地建模填充暂停,其提供灵活性,允许进行填充暂停的变化的话语。 遍历式HMM模型还可以用于其他类型的噪声,例如但不限于呼吸,键盘操作,麦克风噪音,笑声,门开启和/或关闭,或者在用户的环境中发生的或由 用户。 类似地,可以使用遍历HMM模型来建模沉默。 识别可以与N-gram,上下文无关的语法或混合语言模型一起使用。
    • 10. 发明授权
    • Generating large units of graphonemes with mutual information criterion for letter to sound conversion
    • 生成具有相互信息标准的大单位图形,用于字母转换
    • US07693715B2
    • 2010-04-06
    • US10797358
    • 2004-03-10
    • Mei-Yuh HwangLi Jiang
    • Mei-Yuh HwangLi Jiang
    • G10L15/04
    • G10L13/08
    • A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.
    • 提供了一种用于将单词分割成组成部分的方法和装置。 根据本发明,确定在一组单词中发现的一对图形单元的互信息得分。 每个图形单元至少包含一个字母。 基于相互信息得分组合一对图形单元的图形单位。 这形成一个新的图形单元。 在本发明的一个方面,使用相互信息将已经被分段成音节的单词训练在一个音节的n-gram模型上。 音节n-gram模型用于将新词的语音表示分割成音节。 类似地,使用相互信息形成语素的清单,并且训练语素n-gram,其可以用于将新单词分割成语素序列。