专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06813604B1 Methods and apparatus for speaker specific durational adaptation 有权
标题翻译：讲话者具体持续适应的方法和装置
公开(公告)号：US06813604B1
公开(公告)日：2004-11-02
申请号：US09711563
申请日：2000-11-13
申请人： Chi-Lin Shih , Jan Pieter Hendrik van Santen
发明人： Chi-Lin Shih , Jan Pieter Hendrik van Santen
IPC分类号： G10L1308
CPC分类号： G10L13/033 , G10L15/07 , G10L2021/0135
摘要： A text to speech system modeling durational characteristics of a target speaker is addressed herein. A body of target speaker training text is selected having maximum possible information about speaker specific characteristics. The body of target speaker training text is read by a target speaker to produce a target speaker training corpus. A previously generated source model reflecting characteristics of a source model is retrieved and the target speaker training corpus is processed to produce modification parameters reflecting differences between durational characteristics of the target speaker and those predicted by the source model. The modification parameters are applied to the source model to produce a target model. Text inputs are processed using the target model to produce speech outputs reflecting durational characteristics of the target speaker.
摘要翻译：本文解决了目标扬声器的文本到语音系统建模持续时间特征。选择具有关于扬声器特定特征的最大可能信息的目标扬声器训练文本的主体。目标扬声器训练文本的主体由目标演讲者读取，以产生目标讲话者训练语料库。检索反映源模型特征的先前产生的源模型，并处理目标说话者训练语料库以产生反映目标讲话人的持续时间特征与源模型预测的持续时间特征之间的差异的修改参数。将修改参数应用于源模型以产生目标模型。使用目标模型处理文本输入以产生反映目标扬声器的持续特性的语音输出。

2. 发明授权

US07149690B2 Method and apparatus for interactive language instruction 有权
标题翻译：互动语言教学的方法和装置
公开(公告)号：US07149690B2
公开(公告)日：2006-12-12
申请号：US09392844
申请日：1999-09-09
申请人： Katherine Grace August , Nadine Blackwood , Qi P. Li , Michelle McNerney , Chi-Lin Shih , Arun Chandrasekaran Surendran , Jialin Zhong , Qiru Zhou
发明人： Katherine Grace August , Nadine Blackwood , Qi P. Li , Michelle McNerney , Chi-Lin Shih , Arun Chandrasekaran Surendran , Jialin Zhong , Qiru Zhou
IPC分类号： G09B5/06 , G09B17/00 , G10L15/22
CPC分类号： G09B19/06 , G09B5/04 , G09B19/04 , G10L13/00 , G10L15/06
摘要： A method and apparatus for interactive language instruction is provided that displays text files for processing, provide key features and functions for interactive learning, displays facial animation, and provides a workspace for language building functions. The system includes a stored set of language rules as part of the text-to-speech sub-system, as well as another stored set of rules as applied to the process of learning a language. The method implemented by the system includes digitally converting text to audible speech, providing the audible speech to a user or student (with the aid of an animated image in selected circumstances), prompting the student to replicate the audible speech, comparing the student's replication with the audible speech provided by the system, and providing feedback and reinforcement to the student by, for example, selectively recording or playing back the audible speech and the student's replication.
摘要翻译：提供了一种用于交互式语言指令的方法和装置，其显示用于处理的文本文件，提供用于交互式学习的关键特征和功能，显示面部动画，并为语言构建功能提供工作空间。该系统包括作为文本到语音子系统的一部分的存储的一组语言规则，以及应用于学习语言的过程的另一组存储的规则。该系统实施的方法包括将文本数字转换为可听话音，向用户或学生提供可听见的语音（借助于所选情况下的动画图像），提示学生复制可听见的语音，将学生的复制与由系统提供的可听话音，并通过例如选择性地录制或播放可听见的语音和学生的复制来向学生提供反馈和加强。

3. 发明授权

US06856958B2 Methods and apparatus for text to speech processing using language independent prosody markup 失效
标题翻译：使用语言无关的韵律标记的文本到语音处理的方法和装置
公开(公告)号：US06856958B2
公开(公告)日：2005-02-15
申请号：US09845561
申请日：2001-04-30
申请人： Gregory P. Kochanski , Chi-Lin Shih
发明人： Gregory P. Kochanski , Chi-Lin Shih
IPC分类号： G10L13/08 , G10L21/00
CPC分类号： G10L13/10 , G10L13/04
摘要： Techniques are described for employing a set of tags to model phenomena which are smooth and subject to constraints. Tags may be used to model, for example, muscular movement producing speech. In one advantageous application, a set of tags defining prosodic characteristics is developed, and selected tags are placed in appropriate locations of a body of text. Each tag defines a constraint on the prosodic characteristics of speech produced by processing the text. Processing of the body of speech and the tags produces a set of equations which are solved to produce a curve defining prosodic characteristics over the scope of a phrase, and a further set of equations which are solved to produce a curve defining prosodic characteristics of individual words within a phrase. The data defined by the curves is used with the text to produce speech having the prosodic characteristics defined by the tags. A set of tags may be produced by reading of a training text by a target speaker to produce a training corpus reflecting the prosodic characteristics of the target speaker, and then analyzing the training corpus to generate tags modeling the prosodic characteristics of the training corpus.
摘要翻译：描述了使用一组标签来模拟平滑并受约束的现象的技术。标签可能用于建模，例如肌肉运动产生语音。在一个有利的应用中，开发了一组定义韵律特征的标签，并且将选定的标签放置在文本体的适当位置。每个标签定义了通过处理文本产生的语音的韵律特征的约束。语音身体和标签的处理产生一组方程，其被解决以产生在短语范围内定义韵律特征的曲线，以及另一组方程式，其被解决以产生定义各个单词的韵律特征的曲线在短语内。由曲线定义的数据与文本一起使用以产生具有由标签定义的韵律特征的语音。可以通过由目标说话者读取训练文本来产生一组标签，以产生反映目标说话者的韵律特征的训练语料库，然后分析训练语料库以产生模拟训练语料库的韵律特征的标签。

4. 发明授权

US06625576B2 Method and apparatus for performing text-to-speech conversion in a client/server environment 有权
标题翻译：用于在客户/服务器环境中执行文本到语音转换的方法和装置
公开(公告)号：US06625576B2
公开(公告)日：2003-09-23
申请号：US09772300
申请日：2001-01-29
申请人： Gregory P. Kochanski , Joseph Philip Olive , Chi-Lin Shih
发明人： Gregory P. Kochanski , Joseph Philip Olive , Chi-Lin Shih
IPC分类号： G10L1300
CPC分类号： G10L13/047 , G10L13/08
摘要： A method and apparatus for performing text-to-speech conversion in a client/server environment partitions an otherwise conventional text-to-speech conversion algorithm into two portions: a first “text analysis” portion, which generates from an original input text an intermediate representation thereof and a second “speech synthesis” portion, which synthesizes speech waveforms from the intermediate representation generated by the first portion (i.e., the text analysis portion) The text analysis portion of the algorithm is executed exclusively on a server while the speech synthesis portion is executed exclusively on a client which may be associated therewith. The client may comprise a hand-held device such as, for example, a cell phone, and the intermediate representation of the input text advantageously comprises at least a sequence of phonemes representative of the input text. Certain audio segment information which is to be used by the speech synthesis portion of the text-to-speech process may be advantageously transmitted by the server to the client, and a cache of such audio segments may then be advantageously maintained at the client (e.g., in the cell phone) for use by the speech synthesis process in order to obtain improved quality of the synthesized speech.
摘要翻译：用于在客户机/服务器环境中执行文本到语音转换的方法和装置将另外常规的文本到语音转换算法分成两部分：第一“文本分析”部分，其从原始输入文本生成中间其表示和第二“语音合成”部分，其从由第一部分（即，文本分析部分）生成的中间表示合成语音波形。该算法的文本分析部分专门在服务器上执行，而语音合成部分仅在可以与其相关联的客户端上执行。客户端可以包括例如手机的手持设备，并且输入文本的中间表示有利地包括代表输入文本的至少一系列音素。要由文本到语音过程的语音合成部分使用的某些音频段信息可以有利地被服务器发送到客户机，并且然后可以有利地在客户端维护这样的音频段的高速缓存（例如，，在手机中），以便通过语音合成过程使用以获得改进的合成语音质量。

5. 发明授权

US06810378B2 Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech 有权
标题翻译：用于控制语音合成系统以提供多种语言风格的方法和装置
公开(公告)号：US06810378B2
公开(公告)日：2004-10-26
申请号：US09961923
申请日：2001-09-24
申请人： Gregory P. Kochanski , Chi-Lin Shih
发明人： Gregory P. Kochanski , Chi-Lin Shih
IPC分类号： G10L1302
CPC分类号： G10L13/10
摘要： A method and apparatus for synthesizing speech from text whereby the speech may be generated in a manner so as to effectively convey a particular, selectable style. Repeated patterns of one or more prosodic features—such as, for example, pitch, amplitude, spectral tilt, and/or duration—occurring at characteristic locations in the synthesized speech, are advantageously used to convey a particular chosen style. For example, one or more of such feature patterns may be used to define a particular speaking style, and an illustrative text-to-speech system then makes use of such a defined style to adjust the specified parameter or parameters of the synthesized speech in a non-uniform manner (i.e., in accordance with the defined feature pattern or patterns).
摘要翻译：一种用于从文本合成语音的方法和装置，由此语音可以以有效地传达特定的，可选择的风格的方式产生。有利地使用一个或多个韵律特征的重复图案，例如音调，幅度，频谱倾斜和/或在合成语音中的特征位置发生的持续时间。例如，这些特征模式中的一个或多个可以用于定义特定的说话风格，并且说明性文本到语音系统然后利用这种定义的风格来调整合成语音的指定参数或参数不均匀的方式（即根据定义的特征图案或图案）。

6. 发明授权

US06272464B1 Method and apparatus for assembling a prediction list of name pronunciation variations for use during speech recognition 有权
标题翻译：用于组合在语音识别期间使用的名称发音变体的预测列表的方法和装置
公开(公告)号：US06272464B1
公开(公告)日：2001-08-07
申请号：US09534150
申请日：2000-03-27
申请人： George A Kiraz , Joseph Philip Olive , Chi-Lin Shih
发明人： George A Kiraz , Joseph Philip Olive , Chi-Lin Shih
IPC分类号： G10L1518
CPC分类号： G10L13/08 , G10L15/187
摘要： Multiple, yet plausible, pronunciations of a proper name are generated based on one or more potential language origins of the name, and based further on the context in which the name is being spoken—namely, on characteristics of the population of potential speakers. Conventional techniques may be employed to identify likely candidates for the language origin of the name, and the characteristics of the speaker population on which the generation of the pronunciations is further based may comprise, for example, the national origin of the speakers, the purpose of the speech, the geographical location of the speakers, or the general level of sophistication of the speaker population. Specifically, a method and apparatus is provided for generating a plurality of plausible pronunciations for a proper name, the method or apparatus for use in performing speech recognition of speech utterances comprising the proper name by individuals within a given population of speakers, the method or apparatus comprising steps or means respectively for (a) identifying one or more of a plurality of languages as a potential origin of the proper name; and (b) generating a plurality of plausible pronunciations for the given proper name, one or more of the plurality of pronunciations based on the one or more identified languages, and the plurality of plausible pronunciations based further on one or more characteristics associated with the given population of speakers.
摘要翻译：根据名称的一个或多个潜在的语言来源产生了一个适当的名称的多个但似乎合理的发音，并进一步根据其名称的上下文，即关于潜在发言人群体的特征。可以采用常规技术来识别名称的语言起源的可能的候选者，并且发音发音的进一步基础的说话者群体的特征可以包括例如说话者的国籍，目的讲话，发言人的地理位置，或演讲人群的综合水平。具体地说，提供一种方法和装置，用于产生用于正确名称的多个合理发音，该方法或装置用于执行语音语音的语音识别，该语音语音包括在给定的扬声器群体内的个人的专有名称，该方法或装置包括分别用于（a）将多种语言中的一种或多种识别为适当名称的潜在来源的步骤或装置; 和（b）基于所述一种或多种所识别的语言，为所述给定的正确名称产生多个可信的发音，所述多个发音中的一个或多个发音，并且所述多个可信的发音进一步基于与给定的相关联的一个或多个特征演讲人数

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式