专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20080059190A1 Speech unit selection using HMM acoustic models 审中-公开
标题翻译：使用HMM声学模型进行语音单元选择
公开(公告)号：US20080059190A1
公开(公告)日：2008-03-06
申请号：US11508093
申请日：2006-08-22
申请人： Min Chu , Peng Liu , Yong Zhao , Yusheng Li
发明人： Min Chu , Peng Liu , Yong Zhao , Yusheng Li
IPC分类号： G10L13/00
CPC分类号： G10L13/06
摘要： A concatenating speech synthesizer concatenates selected speech units to obtain the desired synthesized speech. When desired speech units of phonetic and/or prosodic context are not available, the synthesizer selects replacement speech units based on measures representative of the difference between the HMM acoustic models of the desired speech unit and available speech units.
摘要翻译：级联语音合成器连接所选择的语音单元以获得期望的合成语音。当需要语音和/或韵律上下文的语音单元不可用时，合成器基于表示期望语音单元的HMM声学模型和可用语音单元之间的差异的度量来选择替换语音单元。

2. 发明授权

US07418389B2 Defining atom units between phone and syllable for TTS systems 有权
标题翻译：为TTS系统定义手机和音节之间的原子单位
公开(公告)号：US07418389B2
公开(公告)日：2008-08-26
申请号：US11033075
申请日：2005-01-11
申请人： Min Chu , Yong Zhao
发明人： Min Chu , Yong Zhao
IPC分类号： G10L13/06 , G10L13/00
CPC分类号： G10L13/08
摘要： A method for identifying common multiphone units to add to a unit inventory for a text-to-speech generator is disclosed. The common multiphone units are units that are larger than a phone, but smaller than a syllable. The method slices each syllable into a plurality of slices. These slices are then sorted and the frequency of each slice is determined. Those slices whose frequencies exceed a threshold are added to the unit inventory. The remaining slices are decomposed according to a predetermined set of rules to determine if they contain slices that should be added to the unit inventory.
摘要翻译：公开了一种用于识别用于添加到文本到语音生成器的单元库存的公共多声单元的方法。普通的多声道单元是比手机大的单位，但小于音节。该方法将每个音节分成多个切片。然后对这些切片进行排序，并确定每个切片的频率。频率超过阈值的那些切片被添加到单位库存中。剩余的切片根据预定的一组规则分解，以确定它们是否包含应该添加到单元库存的切片。

3. 发明授权

US08583438B2 Unnatural prosody detection in speech synthesis 有权
公开(公告)号：US08583438B2
公开(公告)日：2013-11-12
申请号：US11903020
申请日：2007-09-20
申请人： Yong Zhao , Frank Kao-ping Soong , Min Chu , Lijuan Wang
发明人： Yong Zhao , Frank Kao-ping Soong , Min Chu , Lijuan Wang
IPC分类号： G10L13/00
CPC分类号： G10L13/10
摘要： Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.

4. 发明申请

US20050228664A1 Refining of segmental boundaries in speech waveforms using contextual-dependent models 失效
标题翻译：使用上下文相关模型对语音波形中的分段边界进行精细化
公开(公告)号：US20050228664A1
公开(公告)日：2005-10-13
申请号：US10823129
申请日：2004-04-13
申请人： Yong Zhao , Min Chu , Jian-lai Zhou , Lijuan Wang
发明人： Yong Zhao , Min Chu , Jian-lai Zhou , Lijuan Wang
IPC分类号： G10L15/02 , G10L15/06
CPC分类号： G10L15/02 , G10L2015/022
摘要： A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.
摘要翻译：提供了一种用于在语音波形中精细化分段边界的方法和装置。上下文声学特征相似性被用作聚类相邻音素语音单元的基础，其中每个相邻对的音素语音单元包括节段边界。针对每个群集训练一个细化模型，并用于精化形成群集的上下文音素语音单元的边界。

5. 发明申请

US20050060155A1 Optimization of an objective measure for estimating mean opinion score of synthesized speech 失效
标题翻译：优化综合语音平均意见得分的客观量度
公开(公告)号：US20050060155A1
公开(公告)日：2005-03-17
申请号：US10660388
申请日：2003-09-11
申请人： Min Chu , Hu Peng , Yong Zhao
发明人： Min Chu , Hu Peng , Yong Zhao
IPC分类号： G10L13/04 , G10L19/00 , G10L13/00
CPC分类号： G10L25/69 , G10L13/00
摘要： A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.
摘要翻译：提供了一种用于优化用于估计来自语音合成器的合成语音的平均意见分数或自然度的客观测量的方法。该方法包括使用具有直接从用于形成合成话语的文本信息导出的成分的客观度量。客观量度与平均意见分数具有很高的相关性，从而可以在客观量度和相应的平均意见得分之间形成关系。改变客观量度以提供从话语中得出的文本信息的不同功能，以改善客观测量的分数与合成话语的主观评级之间的关系。

6. 发明申请

US20090083036A1 Unnatural prosody detection in speech synthesis 有权
标题翻译：语言合成中的非自然韵律检测
公开(公告)号：US20090083036A1
公开(公告)日：2009-03-26
申请号：US11903020
申请日：2007-09-20
申请人： Yong Zhao , Frank Kao-ping Soong , Min Chu , Lijuan Wang
发明人： Yong Zhao , Frank Kao-ping Soong , Min Chu , Lijuan Wang
IPC分类号： G10L13/08 , G06F17/30
CPC分类号： G10L13/10
摘要： Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.
摘要翻译：描述了一种技术，通过该技术，从文本产生的合成语音针对韵律模型（离线训练）进行评估，以确定语音是否会听起来不自然。如果是，则使用修改的数据重新生成语音。评估和再生可能是迭代的，直到被认为是自然的声音。例如，文本被内置到一个格子中，然后（例如，维特比）被搜索以找到最佳路径。通过韵律模型评估路径上的数据的部分（例如，单位）。如果评估认为一部分对应于非自然韵律，则该部分被替换，例如通过修改/修剪格子并重新执行搜索。替换可能是迭代的，直到所有部分通过评估。不自然的韵律检测可能有偏差，使得在评估期间，相对于错过非自然韵律的速率，以较高的速率错误地检测到非自然韵律。

7. 发明授权

US07693719B2 Providing personalized voice font for text-to-speech applications 失效
标题翻译：为文字到语音应用程序提供个性化的语音字体
公开(公告)号：US07693719B2
公开(公告)日：2010-04-06
申请号：US10977178
申请日：2004-10-29
申请人： Min Chu , Yong Zhao , Sheng Zhao
发明人： Min Chu , Yong Zhao , Sheng Zhao
IPC分类号： G10L21/00 , G10L13/00 , G06F3/16
CPC分类号： G10L13/033 , G10L2021/0135
摘要： A method for synthesizing speech from text includes receiving one or more waveforms characteristic of a voice of a person selected by a user, generating a personalized voice font based on the one or more waveforms, and delivering the personalized voice font to the user's computer, whereby speech can be synthesized from text, the speech being in the voice of the selected person, the speech being synthesized using the personalized voice font. A system includes a text-to-speech (TTS) application operable to generate a voice font based on speech waveforms transmitted from a client computer remotely accessing the TTS application.
摘要翻译：一种用于从文本合成语音的方法包括接收用户选择的人物的声音特征的一个或多个波形，基于一个或多个波形产生个性化语音字体，并将个性化语音字体传送到用户的计算机，由此可以从文本合成语音，语音在所选择的人的语音中，使用个性化语音字体合成语音。一种系统包括文本到语音（TTS）应用，其可操作以基于远程访问TTS应用的客户端计算机发送的语音波形来生成语音字体。

8. 发明授权

US07496512B2 Refining of segmental boundaries in speech waveforms using contextual-dependent models 失效
标题翻译：使用上下文相关模型对语音波形中的分段边界进行精细化
公开(公告)号：US07496512B2
公开(公告)日：2009-02-24
申请号：US10823129
申请日：2004-04-13
申请人： Yong Zhao , Min Chu , Jian-lai Zhou , Lijuan Wang
发明人： Yong Zhao , Min Chu , Jian-lai Zhou , Lijuan Wang
IPC分类号： G10L17/00
CPC分类号： G10L15/02 , G10L2015/022
摘要： A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.
摘要翻译：提供了一种用于在语音波形中精细化分段边界的方法和装置。上下文声学特征相似性被用作聚类相邻音素语音单元的基础，其中每个相邻对的音素语音单元包括节段边界。针对每个群集训练一个细化模型，并用于精化形成群集的上下文音素语音单元的边界。

9. 发明授权

US07386451B2 Optimization of an objective measure for estimating mean opinion score of synthesized speech 失效
标题翻译：优化综合语音平均意见得分的客观量度
公开(公告)号：US07386451B2
公开(公告)日：2008-06-10
申请号：US10660388
申请日：2003-09-11
申请人： Min Chu , Hu Peng , Yong Zhao
发明人： Min Chu , Hu Peng , Yong Zhao
IPC分类号： G10L13/08 , G10L13/00
CPC分类号： G10L25/69 , G10L13/00
摘要： A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.
摘要翻译：提供了一种用于优化用于估计来自语音合成器的合成语音的平均意见分数或自然度的客观测量的方法。该方法包括使用具有直接从用于形成合成话语的文本信息导出的成分的客观度量。客观量度与平均意见分数具有很高的相关性，从而可以在客观量度和相应的平均意见得分之间形成关系。改变客观量度以提供从话语中得出的文本信息的不同功能，以改善客观测量的分数与合成话语的主观评级之间的关系。

10. 发明授权

US07496498B2 Front-end architecture for a multi-lingual text-to-speech system 失效
标题翻译：多语言文字到语音系统的前端架构
公开(公告)号：US07496498B2
公开(公告)日：2009-02-24
申请号：US10396944
申请日：2003-03-24
申请人： Min Chu , Hu Peng , Yong Zhao
发明人： Min Chu , Hu Peng , Yong Zhao
IPC分类号： G06F17/20 , G06F17/28 , G10L11/00 , G10L13/08 , G10L21/00
CPC分类号： G10L13/08
摘要： A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.
摘要翻译：用于处理语音合成器的多语言文本的文本处理系统包括第一语言相关模块，用于对包括第一语言的输入文本的一部分执行文本和韵律分析中的至少一个。第二语言相关模块在包括第二语言的输入文本的第二部分上执行文本和韵律分析中的至少一个。第三模块适于接收来自第一和第二从属模块的输出，并且基于多语言文本在输出上执行韵律和语音上下文抽象。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式