专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US09536545B2 Audio visual signature, method of deriving a signature, and method of comparing audio-visual data background 有权
标题翻译：音频视觉签名，派生签名的方法以及比较视听数据背景的方法
公开(公告)号：US09536545B2
公开(公告)日：2017-01-03
申请号：US14190325
申请日：2014-02-26
申请人： Snell Limited
发明人： Jonathan Diggins
IPC分类号： G10L11/04 , G10L25/00 , H04H60/37 , H04H60/58 , H04H60/59
CPC分类号： G10L25/00 , H04H60/37 , H04H60/58 , H04H60/59
摘要： The invention relates to the analysis of characteristics of audio and/or video signals for the generation of audio-visual content signatures. To determine an audio signature a region of interest for example of high entropy—is identified in audio signature data. This region of interest is then provided as an audio signature with offset information. A video signature is also provided.
摘要翻译：本发明涉及用于产生视听内容签名的音频和/或视频信号的特性的分析。为了确定音频签名，在音频签名数据中识别出例如高熵的感兴趣区域。然后，该感兴趣区域被提供为具有偏移信息的音频签名。还提供视频签名。

2. 发明授权

US09396740B1 Systems and methods for estimating pitch in audio signals based on symmetry characteristics independent of harmonic amplitudes 有权
标题翻译：基于独立于谐波幅度的对称特性来估计音频信号中的音调的系统和方法
公开(公告)号：US09396740B1
公开(公告)日：2016-07-19
申请号：US14502844
申请日：2014-09-30
申请人： THE INTELLISIS CORPORATION
发明人： David C. Bradley
IPC分类号： G10L11/04 , G10L25/90 , G10L25/12 , G10L25/15 , G10L21/0264 , G10L25/00
CPC分类号： G10L25/90 , G10L21/0264 , G10L25/00 , G10L25/15 , G10L25/18 , G10L2025/906
摘要： Pitch in audio signals may be estimated based on symmetry characteristics independent of harmonic amplitudes. A magnitude spectrum of an audio signal may be provided. The magnitude spectrum may be partitioned by dividing a frequency axis into equal-sized cells. Individual cells may be centered on corresponding harmonic frequencies of a hypothesized pitch. The magnitude spectrum contained in individual cells may be normalized to have equal mean magnitudes and equal standard deviations. A likelihood that the hypothesized pitch is an actual pitch of the audio signal may be determined based on symmetries of magnitude spectra contained in individual cells.
摘要翻译：可以基于独立于谐波幅度的对称特性来估计音频信号中的音调。可以提供音频信号的幅度谱。幅度谱可以通过将频率轴划分成相等大小的单元来进行划分。单个单元可以以假设间距的相应谐波频率为中心。包含在单个单元中的幅度谱可以被归一化以具有相等的平均幅度和相等的标准偏差。假设音高是音频信号的实际间距的可能性可以基于包含在各个单元中的幅度谱的对称来确定。

3. 发明授权

US08700388B2 Audio transform coding using pitch correction 有权
标题翻译：使用音调校正的音频变换编码
公开(公告)号：US08700388B2
公开(公告)日：2014-04-15
申请号：US12668912
申请日：2009-03-23
申请人： Bernd Edler , Sascha Disch , Ralf Geiger , Stefan Bayer , Ulrich Kraemer , Guillaume Fuchs , Max Neuendorf , Markus Multrus , Gerald Schuller , Harald Popp
发明人： Bernd Edler , Sascha Disch , Ralf Geiger , Stefan Bayer , Ulrich Kraemer , Guillaume Fuchs , Max Neuendorf , Markus Multrus , Gerald Schuller , Harald Popp
IPC分类号： G10L11/04 , G10L21/00 , G10L15/00 , G10L19/14 , G10L19/02 , G10L13/06
CPC分类号： G10L19/0212 , G10L19/022
摘要： A processed representation of an audio signal having a sequence of frames is generated by sampling the audio signal within first and second frames of the sequence of frames, the second frame following the first frame, the sampling using information on a pitch contour of the first and second frames to derive a first sampled representation. The audio signal is sampled within the second and third frames, the third frame following the second frame in the sequence of frames. The sampling uses the information on the pitch contour of the second frame and information on a pitch contour of the third frame to derive a second sampled representation. A first scaling window is derived for the first sampled representation, and a second scaling window is derived for the second sampled representation, the scaling windows depending on the samplings applied to derive the first sampled representations or the second sampled representation.
摘要翻译：具有帧序列的音频信号的处理表示是通过对帧序列的第一帧和第二帧中的音频信号进行采样，第一帧之后的第二帧，使用关于第一帧的音调轮廓的信息的采样，以及第二帧以获得第一采样表示。音频信号在第二帧和第三帧中采样，第三帧在帧序列中的第二帧之后。采样使用关于第二帧的音调轮廓的信息和关于第三帧的音调轮廓的信息来导出第二采样表示。对于第一采样表示导出第一缩放窗口，并且针对第二采样表示导出第二缩放窗口，缩放窗口取决于应用于导出第一采样表示或第二采样表示的采样。

4. 发明授权

US08676574B2 Method for tone/intonation recognition using auditory attention cues 有权
标题翻译：使用听觉注意线索的音调/语调识别方法
公开(公告)号：US08676574B2
公开(公告)日：2014-03-18
申请号：US12943774
申请日：2010-11-10
申请人： Ozlem Kalinli
发明人： Ozlem Kalinli
IPC分类号： G10L11/04
CPC分类号： G10L15/1807 , G10L17/26 , G10L25/90
摘要： In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.
摘要翻译：在用于音调/语调识别的口语处理方法中，可以为声音的输入窗口确定听觉频谱，并且可以从听觉谱中提取一个或多个多尺度特征。可以使用单独的二维光谱接收滤波器来提取每个多尺度特征。可以生成与一个或多个多尺度特征相对应的一个或多个特征图，并且可以从一个或多个特征图中的每一个提取听觉要点矢量。可以通过增加从一个或多个特征图提取的每个听觉要素矢量来获得累积的要点向量。可以通过使用机器学习算法将累积要点向量映射到一个或多个音调特征来确定与输入声音对应的一个或多个音调特性。

5. 发明授权

US08560328B2 Encoding device, decoding device, and method thereof 有权
标题翻译：编码装置，解码装置及其方法
公开(公告)号：US08560328B2
公开(公告)日：2013-10-15
申请号：US12518371
申请日：2007-12-14
申请人： Tomofumi Yamanashi , Masahiro Oshikiri
发明人： Tomofumi Yamanashi , Masahiro Oshikiri
IPC分类号： G10L19/00 , G10L11/04 , G01L11/06 , G10L15/20 , G10L21/04
CPC分类号： G10L19/24 , G10L21/038
摘要： A decoding device is capable of flexibly calculating high-band spectrum data with a high accuracy in accordance with an encoding band selected by an upper-node layer of the encoding side. In this device: a first layer decoder decodes first layer encoded information to generate a first layer decoded signal; a second layer decoder decodes second layer encoded information to generate a second layer decoded signal; a spectrum decoder performs a band extension process by using the second layer decoded signal and the first layer decoded signal up-sampled in an up-sampler so as to generate an all-band decoded signal; and a switch outputs the first layer decoded signal or the all-band decoded signal according to the control information generated in a controller.
摘要翻译：解码装置能够根据由编码侧的上层节点选择的编码频带，以高精度灵活地计算高频带频谱数据。在该设备中：第一层解码器解码第一层编码信息以产生第一层解码信号; 第二层解码器解码第二层编码信息以产生第二层解码信号; 频谱解码器通过使用第二层解码信号和在上采样器中被上采样的第一层解码信号来执行频带扩展处理，以便产生全频带解码信号; 并且开关根据在控制器中生成的控制信息输出第一层解码信号或全频带解码信号。

6. 发明授权

US08521519B2 Adaptive audio signal source vector quantization device and adaptive audio signal source vector quantization method that search for pitch period based on variable resolution 有权
标题翻译：自适应音频信号源矢量量化装置和自适应音频信号源矢量量化方法，基于可变分辨率搜索音调周期
公开(公告)号：US08521519B2
公开(公告)日：2013-08-27
申请号：US12528661
申请日：2008-02-29
申请人： Kaoru Sato , Toshiyuki Morii
发明人： Kaoru Sato , Toshiyuki Morii
IPC分类号： G10L11/04
CPC分类号： G10L25/90 , G10L19/038 , G10L19/09
摘要： An adaptive sound source vector quantization device includes a first pitch cycle instructor, a search range calculator, and a second pitch cycle instructor. The first pitch cycle instructor successively instructs pitch cycle search candidates in a predetermined search range having a search resolution which transits over a predetermined pitch cycle candidate for the first sub-frame. The search range calculator calculates a predetermined range before and after the pitch cycle of the first sub-frame as the pitch cycle search range for the second sub-frame, if the predetermined range includes the predetermined pitch cycle search candidate. In the predetermined range, the search resolution transits over a boundary defined by the predetermined pitch cycle. The second pitch cycle instructor successively instructs the pitch cycle search candidates in the search range for the second sub-frame.
摘要翻译：自适应声源矢量量化装置包括第一音调周期指导器，搜索范围计算器和第二音调周期指导器。第一音调循环指导器在具有在第一子帧的预定音调周期候选中转移的搜索分辨率的预定搜索范围内连续地指示音调周期搜索候选。如果预定范围包括预定的音调周期搜索候选，则搜索范围计算器计算第一子帧的音调周期之前和之后的预定范围作为第二子帧的音调周期搜索范围。在预定范围内，搜索分辨率在由预定音调周期限定的边界上转移。第二音调周期指导者在第二子帧的搜索范围内连续地指示音调周期搜索候选。

7. 发明授权

US08438014B2 Separating speech waveforms into periodic and aperiodic components, using artificial waveform generated from pitch marks 有权
标题翻译：将语音波形分为周期性和非周期性分量，使用由间距标记产生的人造波形
公开(公告)号：US08438014B2
公开(公告)日：2013-05-07
申请号：US13358702
申请日：2012-01-26
申请人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
发明人： Masahiro Morita , Javier Latorre , Takehiko Kagoshima
IPC分类号： G10L11/06 , G10L11/04
CPC分类号： G10L25/93 , G10L25/90
摘要： According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
摘要翻译：根据一个实施例，在语音处理设备中，提取器对一部分语音信号进行窗口并提取部分波形。计算器执行部分波形的频率分析以计算频谱。估计器产生人造波形，其是根据具有作为语音信号的基频的预定倍数的频率的每个谐波分量的音调标记之间的间隔的波形，并且估计表示频率的频谱特性的谐波谱特征来自每个人造波形的谐波分量。分离器将部分波形分离为由周期性声带振动产生的周期分量，作为声源，并且通过使用相应的谐波频谱特征和频谱的频谱，从除声带之外的非周期声源产生的非周期分量部分波形。

8. 发明授权

US08412518B2 Time warped modified transform coding of audio signals 有权
标题翻译：音频信号的时间变形变换编码
公开(公告)号：US08412518B2
公开(公告)日：2013-04-02
申请号：US12697137
申请日：2010-01-29
申请人： Lars Villemoes
发明人： Lars Villemoes
IPC分类号： G10L11/04 , G10L21/04 , G10L15/00 , G10L19/02 , G10L19/14 , G10L13/06 , G10L13/08 , G10L19/00
CPC分类号： G10L19/002 , G10L19/0212 , G10L19/022
摘要： A representation of an audio signal having a first frame, a second frame following the first frame, and a third frame following the second frame, is derived by estimating first warp information for the first and the second frame and second warp information for the second frame and the third frame, the warp information describing a pitch information of the audio signal. First spectral coefficients for the first and the second frame are derived using the first warp information and a first weighted representation of the first and the second frame, the first weighted representation derived by applying a first window function to the first and the second frames, wherein the first window function depends on the first warp information. Second spectral coefficients for the second and the third frame are derived using the second warp information and a second weighted representation of the second and the third frame, the second weighted representation derived by applying a second window function to the second and the third frames, wherein the second window function depends on the second warp information. The representation of the audio signal is generated including the first and the second spectral coefficients.
摘要翻译：通过估计第一和第二帧的第一扭曲信息和第二帧的第二扭曲信息来导出具有第一帧，第一帧之后的第二帧和第二帧之后的第三帧的音频信号的表示和第三帧，描述音频信号的音调信息的翘曲信息。使用第一和第二帧的第一扭曲信息和第一加权表示来导出第一和第二帧的第一频谱系数，通过对第一和第二帧应用第一窗函数而导出第一加权表示，其中第一个窗口函数取决于第一个warp信息。使用第二扭曲信息和第二和第三帧的第二加权表示导出第二和第三帧的第二频谱系数，通过将第二窗函数应用于第二和第三帧而导出的第二加权表示，其中第二个窗口函数取决于第二个warp信息。产生音频信号的表示，包括第一和第二频谱系数。

9. 发明授权

US08374858B2 Scalable lossless audio codec and authoring tool 有权
标题翻译：可扩展的无损音频编解码器和创作工具
公开(公告)号：US08374858B2
公开(公告)日：2013-02-12
申请号：US12720365
申请日：2010-03-09
申请人： Zoran Fejzo
发明人： Zoran Fejzo
IPC分类号： G10L19/00 , G10L19/14 , G10L11/04 , G10L19/02 , G10L21/04
CPC分类号： G10L19/0017 , G10L19/24
摘要： An audio codec losslessly encodes audio data into a sequence of analysis windows in a scalable bitstream. This is suitably done by separating the audio data into MSB and LSB portions and encoding each with a different lossless algorithm. An authoring tool compares the buffered payload to an allowed payload for each window and selectively scales the losslessly encoded audio data, suitably the LSB portion, in the non-conforming windows to reduce the encoded payload, hence buffered payload. This approach satisfies the media bit rate and buffer capacity constraints without having to filter the original audio data, reencode or otherwise disrupt the lossless bitstream.
摘要翻译：音频编解码器将音频数据无损编码到可分级比特流中的分析窗口序列中。这通过将音频数据分离成MSB和LSB部分并且使用不同的无损算法进行编码来适当地完成。编写工具将缓冲的有效载荷与每个窗口的允许的有效载荷进行比较，并在不合格的窗口中选择性地缩放无损编码的音频数据，适当地缩放LSB部分，以减少编码的有效载荷，从而缓冲有效载荷。这种方法满足媒体比特率和缓冲器容量约束，而不必过滤原始音频数据，重新编码或以其他方式中断无损比特流。

10. 发明授权

US08364472B2 Voice encoding device and voice encoding method 有权
标题翻译：语音编码装置和语音编码方法
公开(公告)号：US08364472B2
公开(公告)日：2013-01-29
申请号：US12528880
申请日：2008-02-29
申请人： Hiroyuki Ehara
发明人： Hiroyuki Ehara
IPC分类号： G10L11/04
CPC分类号： G10L19/005 , G10L19/09 , G10L2019/0011
摘要： Provided is an audio encoding device which can detect an optimal pitch pulse when using pitch pulse information as redundant information. The device includes: a search start decision unit (121) which decides the oldest point among a plurality of points where a pitch pulse may exist; a pitch pulse candidate selection unit (122) which defines a search range as a range between the search start point and the point preceding the point of the head of the current frame by one and selects a decoding sound source vector having a large amplitude in this search range as a pitch pulse position candidate; a selector switch (125) which successively switches a plurality of pitch pulse position candidates inputted from a pitch pulse candidate selection unit (122) for output to a pulse sequence generation unit (123) and an error minimization unit (124); a pulse sequence generation unit (123) which generates as a pulse sequence, a vector generated as an adaptive codebook component from the pitch pulse in the current frame when a pitch pulse is set to be a pitch pulse position candidate inputted from the selector switch (125).
摘要翻译：提供了一种在使用音调脉冲信息作为冗余信息时可以检测最佳音调脉冲的音频编码装置。该装置包括：搜索开始判定单元，其决定可能存在音调脉冲的多个点中的最早点; 音调脉冲候选选择单元（122），其将搜索范围定义为搜索起始点和当前帧的头部之前的点之间的距离之间的范围，并且选择在该帧中具有大振幅的解码声源矢量搜索范围作为音调脉冲位置候选; 选择器开关（125），其连续地切换从音调脉冲候选选择单元（122）输入的多个音调脉冲位置候补，以输出到脉冲序列生成单元（123）和误差最小化单元（124）; 脉冲序列生成单元，当音调脉冲被设置为从选择器开关输入的音调脉冲位置候选时，作为脉冲序列产生当前帧中的音调脉冲作为自适应码本分量生成的矢量（ 125）。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式