会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 51. 发明申请
    • SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS
    • 用于自动检测单位选择合成中的异常应力模式的系统和方法
    • US20120035917A1
    • 2012-02-09
    • US12852146
    • 2010-08-06
    • Yeon-Jun KIMMark Charles BEUTNAGELAlistair D. CONKIEAnn K. SYRDAL
    • Yeon-Jun KIMMark Charles BEUTNAGELAlistair D. CONKIEAnn K. SYRDAL
    • G10L19/00
    • G10L13/033G10L13/027G10L13/043G10L13/10G10L15/1807G10L25/00
    • Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.
    • 这里公开了用于在单位选择语音合成中检测和校正异常应力模式的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统检测表示要合成的语音的所选声学单元中的不正确应力模式,并且校正所选声学单元中的不正确应力模式以产生校正的应力模式。 该系统可以基于校正的应力模式进一步合成语音。 在一个方面,系统还使用诸如分类和回归树,自适应增强,支持向量机和最大熵的机器学习算法对不正确的应力模式进行分类。 以这种方式,文本到语音单元选择语音合成器可以产生具有合适的应力模式的更自然的声音语音,而不管单元选择数据库中的单元的应力。
    • 52. 发明授权
    • System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
    • 通过改进发音建模来增加词汇单词识别率的系统和方法
    • US08095365B2
    • 2012-01-10
    • US12328436
    • 2008-12-04
    • Alistair D. ConkieMazin GilbertAndrej Ljolje
    • Alistair D. ConkieMazin GilbertAndrej Ljolje
    • G10L13/08
    • G06F17/277G10L15/063G10L15/187
    • The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.
    • 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 所述方法包括:将符号输入作为标记的语音数据接收,基于所述符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将所识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。
    • 53. 发明申请
    • SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH
    • 使用修改的VITERBI方法的单元选择文本到语音的系统和方法
    • US20110313772A1
    • 2011-12-22
    • US12818835
    • 2010-06-18
    • Alistair D. CONKIE
    • Alistair D. CONKIE
    • G10L13/00
    • G10L13/02G10L13/04G10L13/06G10L13/07
    • Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
    • 本文公开了用于语音合成的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统接收一组有序列表的语音单元,对于有序列表组中的每个有序列表中的每个相应的语音单元,从适合于级联的下一个有序列表构建语音单元的子列表,执行一个 基于用于每个相应语音单元的语音单元的子列表,通过语音单元的有序列表集合的路径的成本分析,并且基于成本分析,通过所述一组有序列表使用语音单元的最低成本路径来合成语音。 有序列表可以基于每个语音单元的相应音调来排序。 在一个实施例中,可以分配不具有分配音调的语音单元。
    • 54. 发明申请
    • SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING
    • 通过声学模型重建来适应自动语音识别发音的系统和方法
    • US20100312560A1
    • 2010-12-09
    • US12480848
    • 2009-06-09
    • Andrej LJOLJEAlistair D. CONKIEAnn K. SYRDAL
    • Andrej LJOLJEAlistair D. CONKIEAnn K. SYRDAL
    • G10L15/02
    • G10L17/14G10L15/063G10L15/07G10L15/14G10L15/187G10L15/265G10L15/30G10L2015/025
    • Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    • 这里公开的是系统,计算机实现的方法和用于通过声学模型重构来适应自动语音识别发音来识别语音的计算机可读存储介质。 该方法识别在目标方言中典型的本地语音训练的声学模型和匹配的发音字典。 该方法从新的演讲者收集演讲,从而收集到的演讲并转录收集的演讲,以产生一个合理的音素格子。 然后,该方法创建一个自定义语音模型,用于通过用于所有似乎合理的音素的声学模型的加权和来表示在发音字典中使用的每个音素,其中发音字典不改变,而是在每个音素的声学空间的模型中 字典成为典型本地语音的音素的声学模型的加权和。 最后,该方法包括使用定制语音模型通过处理器从目标说话者识别附加语音。
    • 55. 发明授权
    • Automatic segmentation in speech synthesis
    • 语音合成中的自动分割
    • US07587320B2
    • 2009-09-08
    • US11832262
    • 2007-08-01
    • Alistair D. ConkieYeon-Jun Kim
    • Alistair D. ConkieYeon-Jun Kim
    • G10L15/14
    • G10L13/06
    • Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    • 自动分割语音库存的系统和方法。 使用引导数据初始化一组隐马尔可夫模型(HMM)。 接下来重新估计并对齐HMM以产生电话标签。 然后使用频谱边界校正来校正电话标签的电话边界。 可选地,迭代地执行将频谱边界校正的电话标签用作输入而不是引导数据的这个过程,以便进一步减少手动标签与由HMM方法分配的电话标签之间的不匹配。
    • 57. 发明授权
    • Method and system for preselection of suitable units for concatenative speech
    • 用于连接语音的合适单位的预选方法和系统
    • US06684187B1
    • 2004-01-27
    • US09607615
    • 2000-06-30
    • Alistair D. Conkie
    • Alistair D. Conkie
    • G10L1308
    • G10L13/07G10L2015/022
    • A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. Prior to initiating the “real time” synthesis, a database is created of ail possible triphones (there are approximately 10000 in the English language) and their associated preselection costs. At run time, therefore, only the most likely candidates are selected from the triphone database, significantly reducing the calculations that are required to be performed in real time.
    • 用于改善文本到语音合成的响应时间的系统和方法利用“三音节上下文”(即,包括中心音素及其直接上下文的三元组)作为基本单元,而不是执行音素合成。 在开始“实时”综合之前,创建一个可能的三位一体的数据库(英文中约有10000个)及其相关的预选成本。 因此,在运行时,只能从三星电话数据库中选择最有可能的候选人,大大减少了实时执行的计算。