专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08086456B2 Methods and apparatus for rapid acoustic unit selection from a large speech corpus 有权
标题翻译：用于从大型语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US08086456B2
公开(公告)日：2011-12-27
申请号：US12839937
申请日：2010-07-20
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L13/04 , G10L13/06
CPC分类号： G10L13/07 , G10L13/027 , G10L13/043 , G10L13/08
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于级联成本（即连续的声单元对之间的不匹配度量）是计算成本高的，所以可以通过预先计算和缓存级联成本大大降低处理能力。不幸的是，可能的顺序对声学单元的数量使得这种高速缓存变得过高。通过合成大量语音来识别产生的声单元顺序对及其各自的级联成本，提供了一种用于构建有效级联成本数据库的方法。通过以这种方式构建级联成本数据库，运行时所需的处理能力大大降低，对语音质量的影响可以忽略不计。

2. 发明申请

US20100286986A1 Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus 有权
标题翻译：从大型语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US20100286986A1
公开(公告)日：2010-11-11
申请号：US12839937
申请日：2010-07-20
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L13/00
CPC分类号： G10L13/07 , G10L13/027 , G10L13/043 , G10L13/08
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fraction, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于级联成本（即顺序声学单元对之间的不匹配度量）是计算成本高的，所以可以通过预先计算和消除级联成本大大降低处理能力。不幸的是，可能的顺序对声学单元的数量使得这种高速缓存变得过高。然而，统计学实验表明，虽然约85％的声学单位通常用于通用语音，但在实践中小于1％的可能顺序的声学单元对出现。通过合成大量语音，识别产生的声学单元序列对及其各自的级联成本，并且存储可能发生的级联成本，提供了一种用于构建有效级联成本数据库的方法。通过构建这个分数的级联成本数据库，运行时所需的处理能力大大降低，对语音质量的影响可以忽略不计。

3. 发明申请

US20120136663A1 METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS 有权
标题翻译：从大型语音科学选择快速声学单元的方法和装置
公开(公告)号：US20120136663A1
公开(公告)日：2012-05-31
申请号：US13306157
申请日：2011-11-29
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L13/00
CPC分类号： G10L13/07 , G10L13/027 , G10L13/043 , G10L13/08
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于连接成本（即顺序对或声学单元之间的不匹配度量）计算成本高昂，因此可以通过预先计算和测量连接成本大大降低处理成本。可能的顺序对声学单元的数量使得这种缓存变得过高。统计实验表明，虽然约85％的声学单位通常用于通用语音，但是在实践中可能出现小于1％的可能的顺序对或声学单位。该系统综合了大量语音，识别产生的声学单元顺序对及其各自的级联成本，并存储可能发生的这些级联成本。

4. 发明授权

US07761299B1 Methods and apparatus for rapid acoustic unit selection from a large speech corpus 有权
标题翻译：用于从大型语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US07761299B1
公开(公告)日：2010-07-20
申请号：US12057020
申请日：2008-03-27
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L13/00 , G10L13/06
CPC分类号： G10L13/07 , G10L13/027 , G10L13/043 , G10L13/08
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. Concatenation costs are expensive to compute. Processing is reduced by pre-computing and caching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur.
摘要翻译：语音合成系统可以从声学单元的大型数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。连接成本计算成本高昂。处理通过预先计算和缓存连接成本来减少。可能的顺序对声学单元的数量使得这种缓存变得过高。通过合成大量语音，识别产生的声学单元序列对及其各自的级联成本，并且存储可能发生的级联成本，提供了一种用于构建有效级联成本数据库的方法。

5. 发明授权

US08315872B2 Methods and apparatus for rapid acoustic unit selection from a large speech corpus 有权
标题翻译：用于从大型语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US08315872B2
公开(公告)日：2012-11-20
申请号：US13306157
申请日：2011-11-29
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L13/00 , G10L13/06
CPC分类号： G10L13/07 , G10L13/027 , G10L13/043 , G10L13/08
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this faction, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于级联成本（即连续的声单元对之间的不匹配度量）是计算成本高的，所以可以通过预先计算和缓存级联成本大大降低处理能力。不幸的是，可能的顺序对声学单元的数量使得这种高速缓存变得过高。然而，统计学实验表明，虽然约85％的声学单位通常用于通用语音，但在实践中小于1％的可能顺序的声学单元对出现。通过合成大量语音，识别产生的声学单元序列对及其各自的级联成本，并且存储可能发生的级联成本，提供了一种用于构建有效级联成本数据库的方法。通过在该系统中构建级联成本数据库，运行时所需的处理能力大大降低，对语音质量的影响可以忽略不计。

6. 发明授权

US06701295B2 Methods and apparatus for rapid acoustic unit selection from a large speech corpus 有权
标题翻译：用于从大型语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US06701295B2
公开(公告)日：2004-03-02
申请号：US10359171
申请日：2003-02-06
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L1306
CPC分类号： G10L13/07
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于级联成本（即连续的声单元对之间的不匹配度量）是计算成本高的，所以可以通过预先计算和缓存级联成本大大降低处理能力。不幸的是，可能的顺序对声学单元的数量使得这种高速缓存变得过高。然而，统计学实验表明，虽然约85％的声学单位通常用于通用语音，但在实践中小于1％的可能顺序的声学单元对出现。通过合成大量语音，识别产生的声学单元序列对及其各自的级联成本，并且存储可能发生的级联成本，提供了一种用于构建有效级联成本数据库的方法。通过以这种方式构建级联成本数据库，运行时所需的处理能力大大降低，对语音质量的影响可以忽略不计。

7. 发明授权

US06697780B1 Method and apparatus for rapid acoustic unit selection from a large speech corpus 有权
标题翻译：用于从大语音语料库中快速声学单元选择的方法和装置
公开(公告)号：US06697780B1
公开(公告)日：2004-02-24
申请号：US09557146
申请日：2000-04-25
申请人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
发明人： Mark Charles Beutnagel , Mehryar Mohri , Michael Dennis Riley
IPC分类号： G10L1304
CPC分类号： G10L13/07
摘要： A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
摘要翻译：语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元，以产生人造语音。选择的声学单元被选择以最小化给定句子的目标和级联成本的组合。然而，由于级联成本（即连续的声单元对之间的不匹配度量）是计算成本高的，所以可以通过预先计算和缓存级联成本大大降低处理能力。不幸的是，可能的顺序对声学单元的数量使得这种高速缓存变得过高。然而，统计学实验表明，虽然约85％的声学单位通常用于通用语音，但在实践中小于1％的可能顺序的声学单元对出现。通过合成大量语音，识别产生的声学单元序列对及其各自的级联成本，并且存储可能发生的级联成本，提供了一种用于构建有效级联成本数据库的方法。通过以这种方式构建级联成本数据库，运行时所需的处理能力大大降低，对语音质量的影响可以忽略不计。

8. 发明申请

US20080312930A1 METHOD AND SYSTEM FOR ALIGNING NATURAL AND SYNTHETIC VIDEO TO SPEECH SYNTHESIS 有权
标题翻译：用于自然和合成视频对语音合成的方法和系统
公开(公告)号：US20080312930A1
公开(公告)日：2008-12-18
申请号：US12193397
申请日：2008-08-18
申请人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
发明人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
IPC分类号： G10L13/00 , G10L13/08 , G06T13/00
CPC分类号： G06T9/001 , G10L13/00 , G10L21/06 , H04N21/2368 , H04N21/4341
摘要： According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
摘要翻译：根据MPEG-4的TTS架构，面部动画可以由两个流同时驱动 - 文本和面部动画参数。在该架构中，文本输入被发送到驱动面部的嘴形的解码器处的文本到语音转换器。面部动画参数通过通信通道从编码器发送到脸部。本发明包括发送到文本到语音转换器的文本串中的代码（称为书签），哪些书签放置在单词之间以及它们之间。根据本发明，书签带有编码器时间戳。由于文本到语音转换的性质，编码器时间戳与实际时间无关，应被解释为计数器。此外，面部动画参数流携带与文本书签相同的编码器时间戳。本发明的系统读取书签，并向面部动画系统提供编码器时间戳以及实时时间戳。最后，面部动画系统使用书签的编码器时间戳作为参考，将正确的面部动画参数与实时时间戳相关联。

9. 发明授权

US06567779B1 Method and system for aligning natural and synthetic video to speech synthesis 失效
公开(公告)号：US06567779B1
公开(公告)日：2003-05-20
申请号：US08905931
申请日：1997-08-05
申请人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
发明人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
IPC分类号： G10L1300
CPC分类号： G10L15/24 , G10L13/00 , G10L2021/105 , H04N19/20 , H04N19/46 , H04N19/61
摘要： According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

10. 发明授权

US07844463B2 Method and system for aligning natural and synthetic video to speech synthesis 有权
标题翻译：将自然和合成视频与语音合成对齐的方法和系统
公开(公告)号：US07844463B2
公开(公告)日：2010-11-30
申请号：US12193397
申请日：2008-08-18
申请人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
发明人： Andrea Basso , Mark Charles Beutnagel , Joern Ostermann
IPC分类号： G10L13/00 , G06T13/00
CPC分类号： G06T9/001 , G10L13/00 , G10L21/06 , H04N21/2368 , H04N21/4341
摘要： According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text and Facial Animation Parameters. A Text-To-Speech converter drives the mouth shapes of the face. An encoder sends Facial Animation Parameters to the face. The text input can include codes, or bookmarks, transmitted to the Text-to-Speech converter, which are placed between and inside words. The bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. The Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp and a real-time time stamp. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
摘要翻译：根据MPEG-4的TTS架构，面部动画可以同时由两个流驱动 - 文本和面部动画参数。文字转语音转换器驱动脸部的嘴形。编码器将面部动画参数发送到脸部。文本输入可以包括发送到文本到语音转换器的代码或书签，其被放置在内部和内部的单词之间。书签带有编码器时间戳。由于文本到语音转换的性质，编码器时间戳与实际时间无关，应被解释为计数器。面部动画参数流携带与文本书签相同的编码器时间戳。系统读取书签并提供编码器时间戳和实时时间戳。面部动画系统使用书签的编码器时间戳作为参考，将正确的面部动画参数与实时时间戳相关联。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式