专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20110131044A1 TARGET VOICE EXTRACTION METHOD, APPARATUS AND PROGRAM PRODUCT 失效
标题翻译：目标语音提取方法，装置和程序产品
公开(公告)号：US20110131044A1
公开(公告)日：2011-06-02
申请号：US12955882
申请日：2010-11-29
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L17/00
CPC分类号： G10L25/78 , G10L15/20 , G10L21/028 , G10L2021/02166
摘要： An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient.
摘要翻译：提供了一种用于将目标语音与具有不同到达方向的多个其他语音分离的装置，程序产品和方法。该方法包括以下步骤：将第一和第二语音输入设备彼此隔开预定的距离，并且在所述设备接收到语音信号时，为信号计算离散付里叶变换并计算CSP（交叉功率谱相位）系数通过基于所接收的两个频谱信号的相关性叠加多个频率分量，然后从所述两个离散傅里叶变换的语音信号计算加权的CSP系数。通过使用计算的加权CSP系数，通过所述设备从频谱中的其他语音信号接收目标语音。

2. 发明申请

US20080270131A1 METHOD, PREPROCESSOR, SPEECH RECOGNITION SYSTEM, AND PROGRAM PRODUCT FOR EXTRACTING TARGET SPEECH BY REMOVING NOISE 有权
标题翻译：方法，预处理程序，语音识别系统和通过删除噪声提取目标语音的程序产品
公开(公告)号：US20080270131A1
公开(公告)日：2008-10-30
申请号：US12105621
申请日：2008-04-18
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L15/00 , G10L19/14
CPC分类号： G10L15/20 , G10L15/02 , G10L21/02 , G10L21/0272 , G10L2021/02161
摘要： The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (Uω) estimated by one or both of the two speech input devices (Xω(T)) and an arbitrary subtraction constant (α) to obtain a resultant subtracted power spectrum (Yω(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (Dω(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (β) to obtain a power spectrum for speech recognition (Zω(T)).
摘要翻译：本发明涉及通过去除噪声来提取目标语音的方法，预处理器，语音识别系统和程序产品。在本发明的一个实施例中，从通过安装在空间中的不同位置的至少两个语音输入设备获得的两个输入语音提取目标语音，通过使用由一个估计的噪声功率谱（Uomega）来应用频谱减法处理或两个语音输入装置（Xomega（T））和任意减法常数（α）两者以获得合成的减去的功率谱（Yomega（T））。本发明还将基于两个语音输入装置的增益控制应用于合成的减去的功率谱以获得增益控制的功率谱（Domega（T））。本发明还基于任意地板因子（β）对所得的增益控制功率谱进行地板处理，以获得用于语音识别的功率谱（Zomega（T））。

3. 发明授权

US08930185B2 Speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program 有权
标题翻译：语音特征提取装置，语音特征提取方法和语音特征提取方案
公开(公告)号：US08930185B2
公开(公告)日：2015-01-06
申请号：US13392901
申请日：2010-07-12
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L19/02 , G10L15/02 , G10L15/20 , G10L25/24
CPC分类号： G10L15/02 , G10L15/20 , G10L25/24
摘要： A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
摘要翻译：语音特征提取装置，语音特征提取方法和语音特征提取方案。语音特征提取装置包括：第一差分计算模块，用于：（i）接收作为每个频率仓分成帧的语音信号的频谱作为输入; 和（ii）计算每个帧的增量谱，其中Δ谱是频率仓的连续帧内的频谱的差; 以及第一归一化模块，用于通过将Δ谱除以平均频谱的函数来对频率仓的帧的Δ谱进行归一化; 其中平均频谱是通过所有帧的频谱的平均值，其是频率仓的总体语音; 并且其中第一归一化模块的输出被定义为第一增量特征。

4. 发明授权

US08812312B2 System, method and program for speech processing 有权
标题翻译：用于语音处理的系统，方法和程序
公开(公告)号：US08812312B2
公开(公告)日：2014-08-19
申请号：US12200610
申请日：2008-08-28
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L15/00 , G10L15/20 , G10L17/00
CPC分类号： G10L15/20 , G10L15/02 , G10L25/24
摘要： The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.
摘要翻译：本发明涉及用于语音识别的系统，方法和程序。在本发明的实施例中，用于处理语音信号的方法包括接收语音信号的功率谱并产生功率谱的对数功率谱信号。该方法还包括对对数功率谱信号执行离散余弦变换，并切断离散余弦变换信号的倒谱上下项。该方法还包括对从其中切断倒谱谱上限和下限的信号执行逆离散余弦变换。该方法还包括转换逆离散余弦变换信号，以使信号回到功率谱域，并通过使用带回到功率谱的信号作为滤波器来过滤语音信号的功率谱域。

5. 发明申请

US20120330657A1 SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM 有权
标题翻译：语音特征提取装置，语音提取方法和语音特征提取程序（SPEECH FEATURE EXTRACTION PROGRAM
公开(公告)号：US20120330657A1
公开(公告)日：2012-12-27
申请号：US13604721
申请日：2012-09-06
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L15/20
CPC分类号： G10L15/02 , G10L15/20 , G10L25/24
摘要： A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
摘要翻译：语音特征提取装置，语音特征提取方法和语音特征提取方案。语音特征提取装置包括：第一差分计算模块，用于：（i）接收作为每个频率仓分成帧的语音信号的频谱作为输入; 和（ii）计算每个帧的增量谱，其中Δ谱是频率仓的连续帧内的频谱的差; 以及第一归一化模块，用于通过将Δ谱除以平均频谱的函数来对频率仓的帧的Δ谱进行归一化; 其中平均频谱是通过所有帧的频谱的平均值，其是频率仓的总体语音; 并且其中第一归一化模块的输出被定义为第一增量特征。

6. 发明授权

US07856353B2 Method for processing speech signal data with reverberation filtering 有权
标题翻译：用混响滤波处理语音信号数据的方法
公开(公告)号：US07856353B2
公开(公告)日：2010-12-21
申请号：US11834964
申请日：2007-08-07
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L19/02 , G10L15/20 , G10L13/06 , H04B3/20 , H03G3/00 , A61F11/06
CPC分类号： G10L15/20 , G10L19/04 , G10L2021/02082
摘要： Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ω. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function Φ that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.
摘要翻译：用于处理语音信号数据的方法。语音信号被分成帧。每个帧的特征在于代表唯一的时间间隔的帧号T. 每个语音信号的特征在于相对于帧T和频带ω的功率谱。确定语音信号的语音段和混响段。计算分别对应于帧T之前的L帧的L个滤波器系数W（k）（k = 1,2，...，L），使得L个滤波器系数最小化作为平方和的线性组合的函数Φ 在混响段中的剩余语音功率和语音段中减去的语音功率的平方和。所计算的L个滤波器系数存储在计算装置的存储介质中。

7. 发明申请

US20080059157A1 METHOD AND APPARATUS FOR PROCESSING SPEECH SIGNAL DATA 有权
标题翻译：用于处理语音信号数据的方法和装置
公开(公告)号：US20080059157A1
公开(公告)日：2008-03-06
申请号：US11834756
申请日：2007-08-07
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L19/04
CPC分类号： G10L2021/02082
摘要： Method and computing apparatus for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ω. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function Φ that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.
摘要翻译：用于处理语音信号数据的方法和计算装置。语音信号被分成帧。每个帧的特征在于代表唯一的时间间隔的帧号T. 每个语音信号的特征在于相对于帧T和频带ω的功率谱。确定语音信号的语音段和混响段。计算分别对应于帧T之前的L帧的L个滤波器系数W（k）（k = 1,2，...，L），使得L个滤波器系数最小化作为平方和的线性组合的函数Phi 在混响段中的剩余语音功率和语音段中减去的语音功率的平方和。所计算的L个滤波器系数存储在计算装置的存储介质中。

8. 发明授权

US09070375B2 Voice activity detection system, method, and program product 有权
标题翻译：语音活动检测系统，方法和程序产品
公开(公告)号：US09070375B2
公开(公告)日：2015-06-30
申请号：US12394631
申请日：2009-02-27
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L19/02 , G10L25/93
CPC分类号： G10L25/93
摘要： A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.
摘要翻译：低SNR环境下的语音活动检测方法。通过从语音信号提取长期频谱变化分量和谐波结构作为特征向量并且增加语音和非语音之间的特征向量的差异（i），使用长期频谱变化分量特征来执行语音活动检测或（ii）使用长期光谱变化分量提取和谐波结构特征提取。通过使用具有在语音信号中的话语的平均音素持续时间上的窗口长度的长期频谱变化分量，语音活动检测的正确率和准确率比常规方法得到改进。语音活动检测系统和方法提供能够进行非常精确的语音活动检测的语音处理，自动语音识别和语音输出。

9. 发明申请

US20120185243A1 SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM 有权
公开(公告)号：US20120185243A1
公开(公告)日：2012-07-19
申请号：US13392901
申请日：2010-07-10
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L21/00
CPC分类号： G10L15/02 , G10L15/20 , G10L25/24
摘要： A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.

10. 发明申请

US20090222258A1 VOICE ACTIVITY DETECTION SYSTEM, METHOD, AND PROGRAM PRODUCT 有权
标题翻译：语音活动检测系统，方法和程序产品
公开(公告)号：US20090222258A1
公开(公告)日：2009-09-03
申请号：US12394631
申请日：2009-02-27
申请人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
发明人： Takashi Fukuda , Osamu Ichikawa , Masafumi Nishimura
IPC分类号： G10L19/02
CPC分类号： G10L25/93
摘要： A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.
摘要翻译：低SNR环境下的语音活动检测方法。通过从语音信号提取长期频谱变化分量和谐波结构作为特征向量并且增加语音和非语音之间的特征向量的差异（i），使用长期频谱变化分量特征来执行语音活动检测或（ii）使用长期光谱变化分量提取和谐波结构特征提取。通过使用具有在语音信号中的话语的平均音素持续时间上的窗口长度的长期频谱变化分量，语音活动检测的正确率和准确率比常规方法得到改进。语音活动检测系统和方法提供能够进行非常精确的语音活动检测的语音处理，自动语音识别和语音输出。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式