会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Apparatus and method for speech recognition
    • 用于语音识别的装置和方法
    • US07257532B2
    • 2007-08-14
    • US10667150
    • 2003-09-22
    • Soichi Toyama
    • Soichi Toyama
    • G10L15/00
    • G10L15/07G10L15/20
    • Before executing a speech recognition, a composite acoustic model adapted to noise is generated by composition of a noise adaptive representative acoustic model generated by noise-adaptation of each representative acoustic model and difference models stored in advance in a storing section, respectively. Then, the noise and speaker adaptive acoustic model is generated by executing speaker-adaptation to the composite acoustic model with the feature vector series of uttered speech. The renewal difference model is generated by the difference between the noise and speaker adaptive acoustic model and the noise adaptive representative acoustic model, to replace the difference model stored in the storing section therewith. The speech recognition is performed by comparing the feature vector series of the uttered speech to be recognized with the composite acoustic model adapted to noise and speaker generated by the composition of the noise adaptive representative acoustic model and the renewal difference model.
    • 在执行语音识别之前,通过分别由存储部分中预先存储的每个代表性声学模型和差分模型的噪声自适应产生的噪声自适应代表性声学模型的组合来生成适于噪声的复合声学模型。 然后,通过使用发声语音的特征向量序列对复合声学模型执行扬声器适应性来生成噪声和扬声器自适应声学模型。 更新差异模型由噪声和扬声器自适应声学模型与噪声自适应代表声学模型之间的差异产生,以代替存储在存储部分中的差分模型。 语音识别是通过将被识别的发声语音的特征向量序列与由噪声自适应代表声学模型和更新差分模型的组合产生的适合于噪声和扬声器的复合声学模型进行比较来执行的。
    • 3. 发明申请
    • Voice recognition system
    • 语音识别系统
    • US20050091053A1
    • 2005-04-28
    • US10995509
    • 2004-11-24
    • Hajime KobayashiMitsuya KomamuraSoichi Toyama
    • Hajime KobayashiMitsuya KomamuraSoichi Toyama
    • G10L11/02G10L15/02G10L15/04G10L11/04
    • G10L25/78
    • A trained vector creating part 15 creates a characteristic of an unvoiced sound in advance as a trained vector V. Meanwhile, a threshold value THD for distinguishing a voice from a background sound is created based on a predictive residual power ε of a sound which is created during a non-voice period. As a voice is actually uttered, an inner product computation part 18 calculates an inner product of a feature vector A of an input signal Sa and a trained vector V, and a first threshold value judging part 19 judges that it is a voice section when the inner product has a value which is equal to or larger than a predetermined value θ while a second threshold value judging part 21 judges that it is a voice section when the predictive residual power ε of the input signal Sa is larger than a threshold value THD. As at least one of the first threshold value judging part 19 and the second threshold value judging part 21 judges that it is a voice section, a voice section determining part 300 finally judges that it is a voice section and cuts out an input signal Saf which are in units of frames and corresponds to this voice section as a voice Svc which is to be recognized.
    • 经训练的矢量创建部分15预先创建无声声音的特性作为训练矢量V.同时,基于产生的声音的预测剩余功率ε创建用于区分语音与背景声音的阈值THD 在非语音期间。 由于实际上发出声音,内积计算部18计算输入信号Sa的特征矢量A和训练矢量V的内积,第一阈值判定部19判断为声音部时, 内积具有等于或大于预定值θ的值,而当输入信号Sa的预测残余功率ε大于阈值THD时,第二阈值判断部21判断为语音区。 由于第一阈值判定部19和第二阈值判定部21中的至少一个判断为声音部,所以语音部确定部300最终判断为声音部,切断输入信号Saf, 是以帧为单位,并且对应于该声音部分作为要识别的声音Svc。
    • 4. 发明授权
    • Acoustic signal processing unit
    • 声信号处理单元
    • US5444784A
    • 1995-08-22
    • US64804
    • 1993-05-21
    • Soichi Toyama
    • Soichi Toyama
    • G10K15/12H03G3/00
    • G10K15/12Y10S84/26
    • A sound echo machine as an acoustic signal processing unit of the present invention comprising an adder to which an input signal is fed, and a delay circuit for delaying the signal fed from the adder for a certain time to repeatedly feed back to the adder to generate an echo sound further comprises an input signal level detector for detecting the level of the input signal and sending it to a frequency oscillator to vary the oscillating frequency in accordance with the thus detected signal level for feeding it later to the delay circuit so as to modulate the time to be delayed at a predetermined cycle, whereby it can create an acoustic field in which a listener can feel as if various level of reflected sounds are coming towards him from various directions. On the other hand, a sound effecter as an acoustic signal processing unit comprising a plurality of acoustic signal processing sections, a plurality of attenuators each connected to these acoustic signal processing sections, and an adder for summing up all the signals from these attenuators further comprises a signal mixing ratio control section for monitoring the input acoustic signal level, and determining a signal mixing ratio among the respective output signals from the plurality of acoustic signal processing sections in accordance with the thus monitored level of the input acoustic signal, whereby even a simple structure can provide a specific sound effect.
    • 作为本发明的声音信号处理单元的声音回声机,包括输入信号被馈送的加法器和延迟电路,用于将从加法器馈送的信号延迟一定时间,以反复反馈给加法器,以产生 回波声音还包括输入信号电平检测器,用于检测输入信号的电平并将其发送到频率振荡器,以根据这样检测的信号电平来改变振荡频率,以便稍后将其馈送到延迟电路,以便调制 在预定的周期中被延迟的时间,从而可以产生声场,听众可以感觉到各种反射的声音的水平从各个方向到达他。 另一方面,作为声音信号处理单元的声音效果器包括多个声音信号处理部分,各个连接到这些声音信号处理部分的多个衰减器,以及用于对来自这些衰减器的所有信号进行求和的加法器,还包括 信号混合比控制部分,用于监测输入的声信号电平;以及根据由此监视的输入声信号的电平,确定来自多个声信号处理部分的各个输出信号之间的信号混合比,从而即使简单 结构可以提供特定的音效。
    • 6. 发明申请
    • Speech Recognition Device and Speech Recognition Method
    • 语音识别装置及语音识别方法
    • US20080270127A1
    • 2008-10-30
    • US11547322
    • 2005-03-15
    • Hajime KobayashiSoichi ToyamaYasunori Suzuki
    • Hajime KobayashiSoichi ToyamaYasunori Suzuki
    • G10L21/02G10L15/20
    • G10L15/065G10L15/20G10L2015/0635
    • There is provided a voice recognition device and a voice recognition method that enhance the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used. Acoustic models are subjected to clustering processing to calculate the centroid of each cluster and the differential vector between the centroid and each model, model composition between each kind of assumed noise model and the calculated centroid is carried out, and the centroid of each composition model and the differential vector are stored in a memory. In the actual recognition processing, the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, model restoration is carried out on the extracted centroid by using the differential vector stored in the memory, and noise adaptation processing is executed on the basis of the restored model.
    • 提供了一种语音识别装置和语音识别方法,其增强了语音识别处理中噪声适应处理的功能,并降低了正在使用的存储器的容量。 对声学模型进行聚类处理,计算每个聚类的质心和质心与每个模型之间的差分向量,进行各种假设噪声模型与计算出的质心之间的模型组合,以及每个组合模型的质心和 差分矢量存储在存储器中。 在实际识别处理中,从存储器中提取通过语音环境估计估计的对环境最佳的质心,通过使用存储在存储器中的差分矢量对所提取的质心进行模型恢复,并且执行噪声适应处理 恢复模式的基础。
    • 7. 发明授权
    • Speech recognition system with an adaptive acoustic model
    • 具有自适应声学模型的语音识别系统
    • US07065488B2
    • 2006-06-20
    • US09964677
    • 2001-09-28
    • Kiyoshi YajimaSoichi Toyama
    • Kiyoshi YajimaSoichi Toyama
    • G10L15/28G10L15/20G10L21/02
    • G10L21/0208G10L15/10G10L15/144G10L15/20
    • At the time of the speaker adaptation, first feature vector generation sections (7, 8, 9) generate a feature vector series [Ci, M] from which the additive noise and multiplicative noise are removed. A second feature vector generation section (12) generates a feature vector series [Si, M] including the features of the additive noise and multiplicative noise. A path search section (10) conducts a path search by comparing the feature vector series [Ci, m] to the standard vector [an, m] of the standard voice HMM (300). When the speaker adaptation section (11) conducts correlation operation on an average feature vector [S^n, m] of the standard vector [an, m] corresponding to the path search result Dv and the feature vector series [Si, m], the adaptive vector [xn, m] is generated. The adaptive vector [xn, m] updates the feature vector of the speaker adaptive acoustic model (400) used for the speech recognition.
    • 在说话者适应时,第一特征向量生成部分(7,8,9)生成除去附加噪声和乘法噪声的特征向量序列[C i,M i]。 第二特征向量生成部(12)生成包括加性噪声和乘法噪声的特征的特征矢量序列[S i,i,M]。 路径搜索部分(10)通过将特征向量序列[C i,i,m]与标准的标准矢量[a N,m,]进行比较来进行路径搜索 语音HMM(300)。 当扬声器适配部分(11)针对对应于该信号的标准矢量[a,n,m]的平均特征矢量[S ^ N,m]]进行相关运算时 路径搜索结果Dv和特征向量序列[S i,m,]生成自适应向量[x N,m N]。 自适应矢量[x N,m N]更新用于语音识别的扬声器自适应声学模型(400)的特征向量。
    • 9. 发明授权
    • Pitch control apparatus for setting coefficients for cross-fading
operation in accordance with intervals between write address and a
number of read addresses in a sampling cycle
    • 用于根据写入地址和采样周期中的读取地址数之间的间隔来设置用于交叉衰落操作的系数的间距控制装置
    • US5522010A
    • 1996-05-28
    • US425226
    • 1995-04-18
    • Soichi Toyama
    • Soichi Toyama
    • G10H1/20G10H7/02G10L13/02G10L21/04G10L3/02
    • G10L13/033G10H1/20G10H7/02G10L13/0335G10L21/04G10H2250/631
    • A pitch control apparatus which suppresses the occurrence of a tremolo tone which the interval control is performed. Input audio signal data is written at a memory position at a designated writing address in a memory in a predetermined order for every sampling cycle, a plurality of reading addresses of the memory are designated for every sampling cycle, and are set in a different order from the predetermined order for each cycle which is a multiple of the sampling cycle by a predetermined multiplier, data is read from memory positions of designated plurality of reading addresses in the memory, a coefficient is set in accordance with an address interval between the writing address and each of the designated plurality of reading addresses in the memory, the data read out at the plurality of reading addresses are multiplied by the associated coefficients, and the results are added together as output data. The maximum value of interval between each of the plurality of reading addresses, Dmax, is set asDmax=Tdmax/{(1-(1/Jn)).multidot.T.sub.0 }when the pitch is to be raised, and set asDmax=Tdmax/{(1+(1/Jn)).multidot.T.sub.0 }when the pitch is to be lowered,where T.sub.0 denotes the sampling cycle of the input audio signal data, Jn denotes how may times a cycle for skipping sampling data or reading sampling data twice should be longer than the sampling cycle T.sub.0, and Tdmax denotes an allowable time for a time-dependent data shift between the plurality of reading addresses, and the allowable time is set 45 to 80 msec by which the reverberation phenomenon is not remarkably disturbing.
    • 一种音调控制装置,其抑制执行间隔控制的颤音的发生。 在每个采样周期以预定顺序将输入音频信号数据以指定的写入地址写入存储器中的存储器位置,每个采样周期指定存储器的多个读取地址,并且以与 通过预定乘法器作为采样周期的倍数的每个周期的预定顺序,从存储器中指定的多个读取地址的存储器位置读取数据,根据写入地址和写入地址之间的地址间隔设置系数 存储器中的指定多个读取地址中的每一个,将在多个读取地址读出的数据乘以相关联的系数,并将结果相加在一起作为输出数据。 多个读取地址Dmax中的每一个之间的间隔的最大值Dmax被设置为Dmax = Tdmax / {(1-(1 / Jn))×T0},并且设定为Dmax = Tdmax / 当要降低音调时,{(1+(1 / Jn))xT0},其中T0表示输入音频信号数据的采样周期,Jn表示跳过采样数据或读取采样数据两次的周期时间 比采样周期T0长,并且Tdmax表示多个读取地址之间的时间相关数据移位的允许时间,并且允许时间被设置为45至80毫秒,由此混响现象不会显着地受到干扰。