专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20160322055A1 PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS 有权
标题翻译：处理多通道音频波形
公开(公告)号：US20160322055A1
公开(公告)日：2016-11-03
申请号：US15205321
申请日：2016-07-08
申请人： Google Inc.
发明人： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A.U. Bacchiani
IPC分类号： G10L19/008 , G10L15/06 , G10L19/26 , G10L25/30
CPC分类号： G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005
摘要： Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
摘要翻译：方法，包括在计算机存储介质上编码的计算机程序，用于使用各种神经网络处理技术增强用于语音识别的音频波形的处理。一方面，一种方法包括：接收对应于话语的多个音频数据通道; 在时域中将多个滤波器中的每一个与音频波形数据的多个通道中的每一个进行卷积以产生卷积输出，其中多个滤波器具有在训练过程期间已经学习的参数，其共同训练多个滤波器并训练深度神经网络作为声学模型; 对于多个滤波器中的每一个组合用于多个声道波形数据的滤波器的卷积输出; 将组合卷积输出输入到与多个滤波器一起训练的深层神经网络; 并为确定的话语提供转录。

2. 发明申请

US20170353789A1 SOUND SOURCE ESTIMATION USING NEURAL NETWORKS 审中-公开
公开(公告)号：US20170353789A1
公开(公告)日：2017-12-07
申请号：US15170348
申请日：2016-06-01
申请人： Google Inc.
发明人： Chanwoo Kim , Rajeev Conrad Nongpiur , Arun Narayanan
IPC分类号： H04R3/00 , G10L25/30 , H04R5/027
CPC分类号： H04R3/005 , G10L25/30 , H04R5/027 , H04R2201/401 , H04R2430/20 , H04S2400/11 , H04S2400/15 , H04S2420/01
摘要： A system for estimating the location of a stationary or moving sound source includes multiple microphones, which need not be physically aligned in a linear array or a regular geometric pattern in a given environment, an auralizer that generates auralized multi-channel signals based at least on array-related transfer functions and room impulse responses of the microphones as well as signal labels corresponding to the auralized multi-channel signals, a feature extractor that extracts features from the auralized multi-channel signals for efficient processing, and a neural network that can be trained to estimate the location of the sound source based at least on the features extracted from the auralized multi-channel signals and the corresponding signal labels.

3. 发明授权

US10063965B2 Sound source estimation using neural networks 有权
公开(公告)号：US10063965B2
公开(公告)日：2018-08-28
申请号：US15170348
申请日：2016-06-01
申请人： Google Inc.
发明人： Chanwoo Kim , Rajeev Conrad Nongpiur , Arun Narayanan
IPC分类号： G10L25/30 , H04R3/00 , H04R5/027
CPC分类号： H04R3/005 , G01S5/18 , G10L25/30 , H04R5/027 , H04R2201/401 , H04R2430/20 , H04S2400/11 , H04S2400/15 , H04S2420/01
摘要： A system for estimating the location of a stationary or moving sound source includes multiple microphones, which need not be physically aligned in a linear array or a regular geometric pattern in a given environment, an auralizer that generates auralized multi-channel signals based at least on array-related transfer functions and room impulse responses of the microphones as well as signal labels corresponding to the auralized multi-channel signals, a feature extractor that extracts features from the auralized multi-channel signals for efficient processing, and a neural network that can be trained to estimate the location of the sound source based at least on the features extracted from the auralized multi-channel signals and the corresponding signal labels.

4. 发明申请

US20180068675A1 ENHANCED MULTI-CHANNEL ACOUSTIC MODELS 审中-公开
公开(公告)号：US20180068675A1
公开(公告)日：2018-03-08
申请号：US15350293
申请日：2016-11-14
申请人： Google Inc.
发明人： Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan
IPC分类号： G10L25/30 , G10L21/028 , G10L21/0388
CPC分类号： G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166
摘要： This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

5. 发明授权

US09697826B2 Processing multi-channel audio waveforms 有权
公开(公告)号：US09697826B2
公开(公告)日：2017-07-04
申请号：US15205321
申请日：2016-07-08
申请人： Google Inc.
发明人： Tara N. Sainath , Ron J. Weiss , Kevin William Wilson , Andrew W. Senior , Arun Narayanan , Yedid Hoshen , Michiel A. U. Bacchiani
IPC分类号： G10L15/16 , G10L15/06 , G10L21/0216 , G10L15/02
CPC分类号： G10L15/16 , G06N3/0445 , G06N3/0454 , G10L15/02 , G10L15/063 , G10L2021/02166 , H04R3/005
摘要： Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式