会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Method and apparatus for using formant models in speech systems
    • 在语音系统中使用共振峰模型的方法和装置
    • US06505152B1
    • 2003-01-07
    • US09389898
    • 1999-09-03
    • Alejandro Acero
    • Alejandro Acero
    • G10L1906
    • G10L13/04G10L25/15
    • A model is provided for formants found in human speech. Under one aspect of the invention, the model is used in formant tracking by providing probabilities that describe the likelihood that a candidate formant is actually a formant in the speech signal. Other aspects of the invention use this formant tracking to improve the model by regenerating the model based on the formants detected by the formant tracker. Still other aspects of the invention use the formant tracking to compress a speech signal by removing some of the formants from the speech signal. A further aspect of the invention uses the formant model to synthesize speech. Under this aspect of the invention, the formant model is used to identify a most likely formant track for the synthesized speech. Based on this track, a series of resonators are used to introduce the formants into the speech signal.
    • 为人类言语中发现的共振峰提供了一个模型。 在本发明的一个方面,该模型通过提供描述候选共振峰实际上是语音信号中的共振峰的可能性的概率来用于共振峰跟踪。 本发明的其他方面使用该共振峰跟踪来通过基于共振峰跟踪器检测到的共振峰再生模型来改进模型。 本发明的其它方面使用共振峰跟踪来通过从语音信号中去除一些共振峰来压缩语音信号。 本发明的另一方面使用共振峰模型来合成语音。 在本发明的这个方面,共振峰模型用于识别用于合成语音的最可能的共振峰轨道。 基于该轨道,使用一系列谐振器将共振峰引入到语音信号中。
    • 3. 发明授权
    • Computer method and apparatus for grapheme-to-phoneme rule-set-generation
    • 用于刻画到音素规则集的计算机方法和装置
    • US06347295B1
    • 2002-02-12
    • US09179153
    • 1998-10-26
    • Anthony J. VitaleGinger Chun-Che LinThomas Kopec
    • Anthony J. VitaleGinger Chun-Che LinThomas Kopec
    • G10L1906
    • G10L13/08
    • A computer method and apparatus provide automatic generation of grapheme-to-phoneme rules, used in text-to-speech synthesis systems. The invention method and apparatus are based on a statistical analysis of a subject dictionary. The dictionary preferably contains words and their corresponding phonemic data representations, and is analyzed for subgraph patterns. The phoneme strings for words containing the subgraph patterns are then analyzed for common phoneme substrings (subphones) associated with each subgraph. The subphones associated with each subgraph are then checked for conditions such as the highest occurrence count, the proper length, and for compatibility with both ends of the subgraph to which they are associated. A subphone matching these conditions becomes paired with the subgraph to create a rule for text-to-speech processing. Separate prefix, infix, and suffix rule sets may be generated from the invention dictionary analysis.
    • 计算机方法和装置提供在文本到语音合成系统中使用的自动生成字形到音素规则。 本发明的方法和装置基于主题字典的统计分析。 该词典优选地包含单词及其对应的音位数据表示,并且对子图模式进行分析。 然后分析包含子图模式的单词的音素字符串与每个子图相关联的普通音素子字符串(子电话)。 然后检查与每个子图关联的子电话的条件,例如最高出现次数,适当的长度以及与它们相关联的子图的两端的兼容性。 匹配这些条件的子电话与子图配对,以创建文本到语音处理的规则。 可以从本发明词典分析中生成独立的前缀,中缀和后缀规则集。
    • 4. 发明授权
    • Wideband speech codec using a higher sampling rate in analysis and synthesis filtering than in excitation searching
    • 宽带语音编解码器在分析和合成滤波中使用比在激励搜索中更高的采样率
    • US06732070B1
    • 2004-05-04
    • US09505411
    • 2000-02-16
    • Jani Rotola-PukkilaHannu MikkolaJanne Vainio
    • Jani Rotola-PukkilaHannu MikkolaJanne Vainio
    • G10L1906
    • G10L21/038G10L19/0208G10L19/12
    • A codec (coder and decoder) in which LP analysis and LP synthesis of a full wideband speech signal is performed, and, in an excitation search part of the coder (searching for a codeword in case of CELP), the signal is divided into a lower band and a higher band with the lower band searched using a decimated target signal obtained by decimating the input speech signal after filtering it through a wideband LP analysis filter. White noise is optionally used for the higher band excitation. In the decoder, the lower band excitation is first interpolated, and then the two excitations (lower band and higher band) are added together and filtered through a wideband LP synthesis filter. Thus, an LP encoding is provided in which the sampling rate used for the search for a lower band excitation is less than the wideband sampling rate used in the LP analysis and synthesis.
    • 执行全宽带语音信号的LP分析和LP合成的编解码器(编码器和解码器),并且在编码器的激励搜索部分(在CELP的情况下搜索码字)时,信号被分成 使用通过通过宽带LP分析滤波器对其输入的语音信号进行滤波而获得的抽取的目标信号,搜索较低频带和具有较低频带的较高频带。 白噪声可选地用于较高频带的激励。 在解码器中,首先内插较低频带激励,然后将两个激励(较低频带和较高频带)相加在一起,并通过宽带LP合成滤波器滤波。 因此,提供LP编码,其中用于搜索较低频带激励的采样率小于在LP分析和合成中使用的宽带采样率。
    • 6. 发明授权
    • Audio signal compression method, audio signal compression apparatus, speech signal compression method, speech signal compression apparatus, speech recognition method, and speech recognition apparatus
    • 音频信号压缩方法,音频信号压缩装置,语音信号压缩方法,语音信号压缩装置,语音识别方法和语音识别装置
    • US06477490B2
    • 2002-11-05
    • US09892745
    • 2001-06-28
    • Yoshihisa NakatohTakeshi NorimatsuMineo TsushimaTomokazu IshikawaMitsuhiko SerikawaTaro KatayamaJunichi NakahashiYoriko Yagi
    • Yoshihisa NakatohTakeshi NorimatsuMineo TsushimaTomokazu IshikawaMitsuhiko SerikawaTaro KatayamaJunichi NakahashiYoriko Yagi
    • G10L1906
    • H04B1/665G10L2019/0005
    • An audio signal compression apparatus for compressively coding an input audio signal comprises a time-to-frequency transformation unit for transforming the input audio signal to a frequency domain signal; a spectrum envelope calculation unit for calculating a spectrum envelope having different resolutions for different frequencies, from the input audio signal, using a weighting function on frequency based on human auditory characteristics; a normalization unit for normalizing the frequency domain signal using the spectrum envelope to obtain a residual signal; a power normalization unit for normalizing the residual signal by the power; an auditory weighting calculation unit for calculating weighting coefficients on frequency, based on the spectrum of the input audio signal and human auditory characteristics; and a multi-stage quantization device having plural stages of vector quantizers connected in series, to which the normalized residual signal is input, and at least one of the vector quantizers quantizing the residual signal using the weighting coefficients. Therefore, a low frequency band, which is auditively important, can be analyzed with a higher frequency resolution as compared with a high frequency band, whereby efficient signal compression utilizing human auditory characteristics is realized.
    • 一种用于对输入音频信号进行压缩编码的音频信号压缩装置包括用于将输入音频信号变换为频域信号的时间 - 频率变换单元; 频谱包络计算单元,用于根据输入的音频信号,使用基于人的听觉特征的频率的加权函数来计算用于不同频率的不同分辨率的频谱包络; 归一化单元,用于使用频谱包络对频域信号进行归一化以获得残余信号; 功率归一化单元,用于通过所述功率归一化所述残余信号; 听觉加权计算单元,用于基于输入音频信号的频谱和人类听觉特征来计算频率上的加权系数; 以及具有串联连接的多级矢量量化器的多级量化装置,其中输入归一化残差信号,以及使用加权系数量化残差信号的矢量量化器中的至少一个。 因此,与高频带相比,可以以更高的频率分辨率来分析具有重要意义的低频带,从而实现利用人类听觉特性的有效信号压缩。
    • 7. 发明授权
    • Low frequency spectral enhancement system and method
    • 低频谱增强系统和方法
    • US06233549B1
    • 2001-05-15
    • US09199072
    • 1998-11-23
    • Anthony P. MauroGilbert C. Sih
    • Anthony P. MauroGilbert C. Sih
    • G10L1906
    • G10L21/02G10L21/0232
    • A system for enhancing low frequency spectral content of a digitized signal which identifies a fundamental frequency component in the signal and selectively boosts signals within a predetermined range thereof. In the illustrative embodiment, the digitized signal is a frequency domain transformed speech signal. The invention amplifies the low frequency components of the speech signal. The speaker unique fundamental frequency of the speech is computed using pitch delay information and is thus dynamic from frame to frame and also speaker to speaker. This fundamental frequency defines the center point of a gain window which is applied to select frequency components. Only such fundamental frequency components which exhibit a large enough signal to noise ratio have the amplification function applied. Thus, this function can be applied directly following a noise suppression system which has knowledge of the signal quality in each frequency bin. The gain window is ramped up and hanged over to smooth the amplification function between successive frames.
    • 一种用于增强数字化信号的低频谱含量的系统,其识别信号中的基频分量并且选择性地提高其预定范围内的信号。 在说明性实施例中,数字化信号是频域变换的语音信号。 本发明放大语音信号的低频分量。 使用音调延迟信息来计算扬声器独特的基本频率,并且因此是帧间动态的,而扬声器也是扬声器。 该基本频率定义了应用于选择频率分量的增益窗口的中心点。 仅具有足够大的信噪比的基频分量具有应用的放大功能。 因此,该功能可以直接应用于具有每个频率仓中的信号质量知识的噪声抑制系统。 增益窗口上升并挂起,以平滑连续帧之间的放大功能。
    • 9. 发明授权
    • Variable dimension spectral magnitude quantization apparatus and method using predictive and mel-scale binary vector
    • 可变维频谱幅度量化装置和方法,使用预测和梅尔二进制向量
    • US06606592B1
    • 2003-08-12
    • US09584107
    • 2000-05-31
    • Yong-duk ChoMoo-young Kim
    • Yong-duk ChoMoo-young Kim
    • G10L1906
    • G10L19/06G10L25/12G10L2019/0007
    • A variable dimension spectral magnitude quantization apparatus and method using a predictive and mel scale binary vector is provided. The apparatus according to linear prediction spectral envelope and residual spectral envelope quantization using low order linear prediction modeling and residual spectrum modeling, includes a predictive quantizer for obtaining a predictive-quantized first residual spectral envelope from a quantized previous residual spectral envelope, a mel-scale binary vector quantizer for obtaining a second residual spectral envelope represented with a linear scale code vector using a mel-scale binary vector codebook, a synthesized spectral envelope generator for adding the output of the predictive quantizer and the output of the mel-scale binary vector quantizer to generate a quantized residual spectral envelope and multiplying the quantized residual spectral envelope by a corresponding quantized linear prediction spectral envelope to generate a synthesized spectral envelope, a comparator for comparing the synthesized spectral envelope with an original spectral envelope, and a minimum value detector for detecting a minimum value from the values sequentially obtained by the comparator.
    • 提供了一种使用预测和梅尔标度二进制向量的可变维频谱幅度量化装置和方法。 根据线性预测频谱包络和使用低阶线性预测建模和残余频谱建模的残余频谱包络量化的装置包括预测量化器,用于从量化的先前剩余频谱包络获得预测量化的第一残余频谱包络, 二进制矢量量化器,用于获得使用梅尔二进制矢量码本的线性比例代码矢量表示的第二残差频谱包络;合成频谱包络发生器,用于将预测量化器的输出和色标二进制矢量量化器的输出相加 产生量化的残余频谱包络并将量化的残余频谱包络乘以对应的量化的线性预测频谱包络以产生合成频谱包络,用于将合成频谱包络与原始频谱包络比较的比较器和最小值检测 r用于从由比较器顺序获得的值中检测最小值。
    • 10. 发明授权
    • Speech coding employing hybrid linear prediction coding
    • 使用混合线性预测编码的语音编码
    • US06606591B1
    • 2003-08-12
    • US09548204
    • 2000-04-13
    • Huan-yu Su
    • Huan-yu Su
    • G10L1906
    • G10L19/06G10L19/087G10L19/26
    • A speech coding system that employs hybrid linear prediction coding during extraction of linear prediction coefficients within ITU-Recommendation speech coding standards. The present invention is operable within linear prediction speech coding systems including code-excited linear prediction speech coding systems, and it provides for a substantially improved perceptual quality of reproduced speech signals when compared to conventional speech coding methods that employ the commonly known auto-correlation method that is based on minimizing the linear prediction coding (LPC) prediction error energy. The invention is operable to provide for high perceptual quality of reproduced speech signals having substantial differences of energy in various frequency bands. For example, for speech signals having information dispersed broadly across the frequency spectrum, such as having a significant amount of information at low frequency and a significant amount of information at high frequency, the invention provides a way to maintain a high perceptual quality across the broad frequency range. The invention generates a single set of linear prediction coefficients (LPCs) either directly from the speech signal in certain embodiments of the invention, or alternatively, interveningly through the use of line spectral frequencies (LSFs) that are generated from different sets of linear prediction coefficients (LPCs) generated from the speech signal itself in other embodiments of the invention.
    • 一种语音编码系统,其在提取ITU-R语音编码标准中的线性预测系数的过程中采用混合线性预测编码。 本发明在包括码激励线性预测语音编码系统的线性预测语音编码系统中可操作,并且与使用公知的自相关方法的常规语音编码方法相比,它提供了显着改善的再现语音信号感知质量 这是基于最小化线性预测编码(LPC)预测误差能量。 本发明可操作以提供在各种频带中具有实质上的能量差异的再现语音信号的高感知质量。 例如,对于具有在频谱上广泛分散的信息的语音信号,例如具有低频率的大量信息和高频率的大量信息,本发明提供了一种在广泛的范围内维持高感知质量的方法 频率范围。 本发明在本发明的某些实施例中直接从语音信号生成单组线性预测系数(LPC),或者替代地,通过使用从不同的线性预测系数集合生成的线谱频率(LSF) (LPC)在本发明的其他实施例中由语音信号本身产生。