专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08494856B2 Speech synthesizer, speech synthesizing method and program product 失效
标题翻译：语音合成器，语音合成方法和程序产品
公开(公告)号：US08494856B2
公开(公告)日：2013-07-23
申请号：US13271321
申请日：2011-10-12
申请人： Javier Latorre , Masami Akamine
发明人： Javier Latorre , Masami Akamine
IPC分类号： G10L13/08 , G10L13/00 , G10L13/06 , G10L15/06 , G10L17/00 , G09G5/22 , G06F17/28
CPC分类号： G10L13/10
摘要： According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
摘要翻译：根据一个实施例，语音合成器包括分析器，第一估计器，选择器，发生器，第二估计器和合成器。分析仪分析文本并提取语言特征。第一估计器选择适合于语言特征的第一韵律模型，并且估计使表示所选择的第一韵律模型的概率的第一似然最大化的韵律信息。选择器选择使根据韵律信息确定的成本函数最小化的语音单元。发生器产生作为语音单元的韵律信息的模型的第二韵律模型。第二估计器估计使基于第一可能性计算的第三似然最大化的韵律信息和表示第二韵律模型的概率的第二似然。合成器基于由第二估计器估计的韵律信息来连接语音单元来产生合成语音。

2. 发明授权

US08407053B2 Speech processing apparatus, method, and computer program product for synthesizing speech 失效
标题翻译：用于合成语音的语音处理装置，方法和计算机程序产品
公开(公告)号：US08407053B2
公开(公告)日：2013-03-26
申请号：US12405587
申请日：2009-03-17
申请人： Javier Latorre , Masami Akamine
发明人： Javier Latorre , Masami Akamine
IPC分类号： G10L13/08
CPC分类号： G10L13/0335 , G10L13/10
摘要： A speech processing apparatus, including a segmenting unit to divide a fundamental frequency signal of a speech signal corresponding to an input text into pitch segments, based on an alignment between samples of at least one given linguistic level included in the input text and the speech signal. Character strings of the input text are divided into the samples based on each linguistic level. A parameterizing unit generates a parametric representation of the pitch segments using a predetermined invertible operator and generates a group of first parameters in correspondence with each linguistic level. A descriptor generating unit generates, for each linguistic level, a descriptor that includes a set of features describing each sample in the input text and a model learning unit classifies the first parameters of each linguistic level of all speech signals in a memory into clusters based on the descriptor corresponding to the linguistic level.
摘要翻译：一种语音处理装置，包括：分割单元，用于基于输入文本中包括的至少一个给定语言级别的采样与语音信号之间的对齐，将对应于输入文本的语音信号的基频分频划分为音调段。基于每个语言层面的输入文本的字符串被分为样本。参数化单元使用预定的可逆运算符生成音调段的参数表示，并且生成与每个语言水平相对应的一组第一参数。描述符生成单元针对每个语言级别生成包括描述输入文本中的每个样本的一组特征的描述符，并且模型学习单元将存储器中的所有语音信号的每个语言级别的第一参数基于描述符与语言层次相对应。

3. 发明申请

US20120065961A1 SPEECH MODEL GENERATING APPARATUS, SPEECH SYNTHESIS APPARATUS, SPEECH MODEL GENERATING PROGRAM PRODUCT, SPEECH SYNTHESIS PROGRAM PRODUCT, SPEECH MODEL GENERATING METHOD, AND SPEECH SYNTHESIS METHOD 审中-公开
标题翻译：语音模型生成设备，语音合成设备，语音模型产生程序产品，语音合成程序产品，语音模型生成方法和语音合成方法
公开(公告)号：US20120065961A1
公开(公告)日：2012-03-15
申请号：US13238187
申请日：2011-09-21
申请人： Javier Latorre , Masami Akamine
发明人： Javier Latorre , Masami Akamine
IPC分类号： G06F17/27
CPC分类号： G10L13/08 , G10L13/07
摘要： According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster.
摘要翻译：根据一个实施例，语音模型生成装置包括频谱分析器，块，参数设备，聚类单元和模型训练单元。频谱分析仪获取对应于文本信息的语音信号，并计算一组频谱系数。块获取指示语言单元的开始和结束的边界信息，并将语音信号块化成语言单元。参数化器基于频谱系数计算语言单元的光谱轨迹参数的轨迹的一组光谱轨迹参数。聚类单元将基于语言信息的每个语言单元计算的频谱轨迹参数聚类成簇。模型训练单元基于属于同一簇的频谱轨迹参数获得指示簇的特征的经训练的频谱轨迹模型。

4. 发明授权

US08438014B2 Separating speech waveforms into periodic and aperiodic components, using artificial waveform generated from pitch marks 有权
标题翻译：将语音波形分为周期性和非周期性分量，使用由间距标记产生的人造波形
公开(公告)号：US08438014B2
公开(公告)日：2013-05-07
申请号：US13358702
申请日：2012-01-26
申请人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
发明人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
IPC分类号： G10L11/06 , G10L11/04
CPC分类号： G10L25/93 , G10L25/90
摘要： According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
摘要翻译：根据一个实施例，在语音处理设备中，提取器对一部分语音信号进行窗口并提取部分波形。计算器执行部分波形的频率分析以计算频谱。估计器产生人造波形，其是根据具有作为语音信号的基频的预定倍数的频率的每个谐波分量的音调标记之间的间隔的波形，并且估计表示频率的频谱特性的谐波谱特征来自每个人造波形的谐波分量。分离器将部分波形分离为由周期性声带振动产生的周期分量，作为声源，并且通过使用相应的谐波频谱特征和频谱的频谱，从除声带之外的非周期声源产生的非周期分量部分波形。

5. 发明申请

US20090248417A1 SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT 失效
标题翻译：语音处理设备，方法和计算机程序产品
公开(公告)号：US20090248417A1
公开(公告)日：2009-10-01
申请号：US12405587
申请日：2009-03-17
申请人： Javier Latorre , Masami Akamine
发明人： Javier Latorre , Masami Akamine
IPC分类号： G10L13/08 , G10L13/06 , G10L13/00
CPC分类号： G10L13/0335 , G10L13/10
摘要： A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level. Moreover, the parameterization of that main level has to be invertible so that the final pitch contour is obtained from the parameters of that level by means of an inverse transformation.
摘要翻译：提出了一种产生语音合成的音调轮廓的方法。该方法基于找到音调轮廓，该音高轮廓使得在一个或多个语言水平上通过语音的音高轮廓段的所有统计模型的组合产生的总似然函数最大化。这些统计模型通过一种决策树从口语语言数据库中训练出来，该决策树为每个语言级别聚集从口语语音数据提取的音调段的参数表示，并从与该语音数据相关联的文本获得的一些特征。音调段的参数化以这样的方式执行，任何语言水平的似然函数可以根据其中一个级别的参数来表示，从而允许相对于该级别的参数来计算最大化。此外，该主电平的参数化必须是可逆的，以便通过逆变换从该电平的参数获得最终音调轮廓。

6. 发明申请

US20120185244A1 SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT 有权
标题翻译：语音处理设备，语音处理方法和计算机程序产品
公开(公告)号：US20120185244A1
公开(公告)日：2012-07-19
申请号：US13358702
申请日：2012-01-26
申请人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
发明人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
IPC分类号： G10L11/04
CPC分类号： G10L25/93 , G10L25/90
摘要： According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
摘要翻译：根据一个实施例，在语音处理设备中，提取器对一部分语音信号进行窗口并提取部分波形。计算器执行部分波形的频率分析以计算频谱。估计器产生人造波形，其是根据具有作为语音信号的基频的预定倍数的频率的每个谐波分量的音调标记之间的间隔的波形，并且估计表示频率的频谱特性的谐波谱特征来自每个人造波形的谐波分量。分离器将部分波形分离为由周期性声带振动产生的周期分量，作为声源，并且通过使用相应的谐波频谱特征和频谱的频谱，从声源除非声带振动产生的非周期分量部分波形。

7. 发明申请

US20120089402A1 SPEECH SYNTHESIZER, SPEECH SYNTHESIZING METHOD AND PROGRAM PRODUCT 失效
标题翻译：语音合成器，语音合成方法和程序产品
公开(公告)号：US20120089402A1
公开(公告)日：2012-04-12
申请号：US13271321
申请日：2011-10-12
申请人： Javier Latorre , Masami Akamine
发明人： Javier Latorre , Masami Akamine
IPC分类号： G10L13/08
CPC分类号： G10L13/10
摘要： According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
摘要翻译：根据一个实施例，语音合成器包括分析器，第一估计器，选择器，发生器，第二估计器和合成器。分析仪分析文本并提取语言特征。第一估计器选择适合于语言特征的第一韵律模型，并且估计使表示所选择的第一韵律模型的概率的第一似然最大化的韵律信息。选择器选择使根据韵律信息确定的成本函数最小化的语音单元。发生器产生作为语音单元的韵律信息的模型的第二韵律模型。第二估计器估计使基于第一可能性计算的第三似然最大化的韵律信息和表示第二韵律模型的概率的第二似然。合成器基于由第二估计器估计的韵律信息来连接语音单元来产生合成语音。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式