会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Methods and apparatus for speaker specific durational adaptation
    • 讲话者具体持续适应的方法和装置
    • US06813604B1
    • 2004-11-02
    • US09711563
    • 2000-11-13
    • Chi-Lin ShihJan Pieter Hendrik van Santen
    • Chi-Lin ShihJan Pieter Hendrik van Santen
    • G10L1308
    • G10L13/033G10L15/07G10L2021/0135
    • A text to speech system modeling durational characteristics of a target speaker is addressed herein. A body of target speaker training text is selected having maximum possible information about speaker specific characteristics. The body of target speaker training text is read by a target speaker to produce a target speaker training corpus. A previously generated source model reflecting characteristics of a source model is retrieved and the target speaker training corpus is processed to produce modification parameters reflecting differences between durational characteristics of the target speaker and those predicted by the source model. The modification parameters are applied to the source model to produce a target model. Text inputs are processed using the target model to produce speech outputs reflecting durational characteristics of the target speaker.
    • 本文解决了目标扬声器的文本到语音系统建模持续时间特征。 选择具有关于扬声器特定特征的最大可能信息的目标扬声器训练文本的主体。 目标扬声器训练文本的主体由目标演讲者读取,以产生目标讲话者训练语料库。 检索反映源模型特征的先前产生的源模型,并处理目标说话者训练语料库以产生反映目标讲话人的持续时间特征与源模型预测的持续时间特征之间的差异的修改参数。 将修改参数应用于源模型以产生目标模型。 使用目标模型处理文本输入以产生反映目标扬声器的持续特性的语音输出。
    • 3. 发明授权
    • Methods and apparatus for text to speech processing using language independent prosody markup
    • 使用语言无关的韵律标记的文本到语音处理的方法和装置
    • US06856958B2
    • 2005-02-15
    • US09845561
    • 2001-04-30
    • Gregory P. KochanskiChi-Lin Shih
    • Gregory P. KochanskiChi-Lin Shih
    • G10L13/08G10L21/00
    • G10L13/10G10L13/04
    • Techniques are described for employing a set of tags to model phenomena which are smooth and subject to constraints. Tags may be used to model, for example, muscular movement producing speech. In one advantageous application, a set of tags defining prosodic characteristics is developed, and selected tags are placed in appropriate locations of a body of text. Each tag defines a constraint on the prosodic characteristics of speech produced by processing the text. Processing of the body of speech and the tags produces a set of equations which are solved to produce a curve defining prosodic characteristics over the scope of a phrase, and a further set of equations which are solved to produce a curve defining prosodic characteristics of individual words within a phrase. The data defined by the curves is used with the text to produce speech having the prosodic characteristics defined by the tags. A set of tags may be produced by reading of a training text by a target speaker to produce a training corpus reflecting the prosodic characteristics of the target speaker, and then analyzing the training corpus to generate tags modeling the prosodic characteristics of the training corpus.
    • 描述了使用一组标签来模拟平滑并受约束的现象的技术。 标签可能用于建模,例如肌肉运动产生语音。 在一个有利的应用中,开发了一组定义韵律特征的标签,并且将选定的标签放置在文本体的适当位置。 每个标签定义了通过处理文本产生的语音的韵律特征的约束。 语音身体和标签的处理产生一组方程,其被解决以产生在短语范围内定义韵律特征的曲线,以及另一组方程式,其被解决以产生定义各个单词的韵律特征的曲线 在短语内。 由曲线定义的数据与文本一起使用以产生具有由标签定义的韵律特征的语音。 可以通过由目标说话者读取训练文本来产生一组标签,以产生反映目标说话者的韵律特征的训练语料库,然后分析训练语料库以产生模拟训练语料库的韵律特征的标签。
    • 4. 发明授权
    • Method and apparatus for performing text-to-speech conversion in a client/server environment
    • 用于在客户/服务器环境中执行文本到语音转换的方法和装置
    • US06625576B2
    • 2003-09-23
    • US09772300
    • 2001-01-29
    • Gregory P. KochanskiJoseph Philip OliveChi-Lin Shih
    • Gregory P. KochanskiJoseph Philip OliveChi-Lin Shih
    • G10L1300
    • G10L13/047G10L13/08
    • A method and apparatus for performing text-to-speech conversion in a client/server environment partitions an otherwise conventional text-to-speech conversion algorithm into two portions: a first “text analysis” portion, which generates from an original input text an intermediate representation thereof and a second “speech synthesis” portion, which synthesizes speech waveforms from the intermediate representation generated by the first portion (i.e., the text analysis portion) The text analysis portion of the algorithm is executed exclusively on a server while the speech synthesis portion is executed exclusively on a client which may be associated therewith. The client may comprise a hand-held device such as, for example, a cell phone, and the intermediate representation of the input text advantageously comprises at least a sequence of phonemes representative of the input text. Certain audio segment information which is to be used by the speech synthesis portion of the text-to-speech process may be advantageously transmitted by the server to the client, and a cache of such audio segments may then be advantageously maintained at the client (e.g., in the cell phone) for use by the speech synthesis process in order to obtain improved quality of the synthesized speech.
    • 用于在客户机/服务器环境中执行文本到语音转换的方法和装置将另外常规的文本到语音转换算法分成两部分:第一“文本分析”部分,其从原始输入文本生成中间 其表示和第二“语音合成”部分,其从由第一部分(即,文本分析部分)生成的中间表示合成语音波形。该算法的文本分析部分专门在服务器上执行,而语音合成部分 仅在可以与其相关联的客户端上执行。 客户端可以包括例如手机的手持设备,并且输入文本的中间表示有利地包括代表输入文本的至少一系列音素。 要由文本到语音过程的语音合成部分使用的某些音频段信息可以有利地被服务器发送到客户机,并且然后可以有利地在客户端维护这样的音频段的高速缓存(例如, ,在手机中),以便通过语音合成过程使用以获得改进的合成语音质量。
    • 5. 发明授权
    • Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech
    • 用于控制语音合成系统以提供多种语言风格的方法和装置
    • US06810378B2
    • 2004-10-26
    • US09961923
    • 2001-09-24
    • Gregory P. KochanskiChi-Lin Shih
    • Gregory P. KochanskiChi-Lin Shih
    • G10L1302
    • G10L13/10
    • A method and apparatus for synthesizing speech from text whereby the speech may be generated in a manner so as to effectively convey a particular, selectable style. Repeated patterns of one or more prosodic features—such as, for example, pitch, amplitude, spectral tilt, and/or duration—occurring at characteristic locations in the synthesized speech, are advantageously used to convey a particular chosen style. For example, one or more of such feature patterns may be used to define a particular speaking style, and an illustrative text-to-speech system then makes use of such a defined style to adjust the specified parameter or parameters of the synthesized speech in a non-uniform manner (i.e., in accordance with the defined feature pattern or patterns).
    • 一种用于从文本合成语音的方法和装置,由此语音可以以有效地传达特定的,可选择的风格的方式产生。 有利地使用一个或多个韵律特征的重复图案,例如音调,幅度,频谱倾斜和/或在合成语音中的特征位置发生的持续时间。 例如,这些特征模式中的一个或多个可以用于定义特定的说话风格,并且说明性文本到语音系统然后利用这种定义的风格来调整合成语音的指定参数或参数 不均匀的方式(即根据定义的特征图案或图案)。
    • 6. 发明授权
    • Method and apparatus for assembling a prediction list of name pronunciation variations for use during speech recognition
    • 用于组合在语音识别期间使用的名称发音变体的预测列表的方法和装置
    • US06272464B1
    • 2001-08-07
    • US09534150
    • 2000-03-27
    • George A KirazJoseph Philip OliveChi-Lin Shih
    • George A KirazJoseph Philip OliveChi-Lin Shih
    • G10L1518
    • G10L13/08G10L15/187
    • Multiple, yet plausible, pronunciations of a proper name are generated based on one or more potential language origins of the name, and based further on the context in which the name is being spoken—namely, on characteristics of the population of potential speakers. Conventional techniques may be employed to identify likely candidates for the language origin of the name, and the characteristics of the speaker population on which the generation of the pronunciations is further based may comprise, for example, the national origin of the speakers, the purpose of the speech, the geographical location of the speakers, or the general level of sophistication of the speaker population. Specifically, a method and apparatus is provided for generating a plurality of plausible pronunciations for a proper name, the method or apparatus for use in performing speech recognition of speech utterances comprising the proper name by individuals within a given population of speakers, the method or apparatus comprising steps or means respectively for (a) identifying one or more of a plurality of languages as a potential origin of the proper name; and (b) generating a plurality of plausible pronunciations for the given proper name, one or more of the plurality of pronunciations based on the one or more identified languages, and the plurality of plausible pronunciations based further on one or more characteristics associated with the given population of speakers.
    • 根据名称的一个或多个潜在的语言来源产生了一个适当的名称的多个但似乎合理的发音,并进一步根据其名称的上下文,即关于潜在发言人群体的特征。 可以采用常规技术来识别名称的语言起源的可能的候选者,并且发音发音的进一步基础的说话者群体的特征可以包括例如说话者的国籍,目的 讲话,发言人的地理位置,或演讲人群的综合水平。 具体地说,提供一种方法和装置,用于产生用于正确名称的多个合理发音,该方法或装置用于执行语音语音的语音识别,该语音语音包括在给定的扬声器群体内的个人的专有名称,该方法或装置 包括分别用于(a)将多种语言中的一种或多种识别为适当名称的潜在来源的步骤或装置; 和(b)基于所述一种或多种所识别的语言,为所述给定的正确名称产生多个可信的发音,所述多个发音中的一个或多个发音,并且所述多个可信的发音进一步基于与给定的相关联的一个或多个特征 演讲人数