会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 94. 发明申请
    • DEVICES AND METHODS FOR EVALUATING SPEECH QUALITY
    • 用于评估演讲质量的装置和方法
    • WO2018028767A1
    • 2018-02-15
    • PCT/EP2016/068966
    • 2016-08-09
    • HUAWEI TECHNOLOGIES CO., LTD.XIAO, Wei
    • XIAO, WeiHAKAMI, MonaKLEIJN, Willem Bastiaan
    • G10L25/69G10L25/30
    • G10L25/60G06F17/18G10L15/02G10L25/03G10L25/30G10L25/69
    • The invention relates to an apparatus (200) for determining a quality score (MOS) for an audio signal sample, the apparatus (200) comprising: an extractor (201) configured to extract a feature vector from the audio signal sample, wherein the feature vector comprises a plurality of feature values and wherein each feature value is associated to a different feature of the feature vector; a pre-processor (203) configured to pre-process a feature value of the feature vector based on a cumulative distribution function associated to the feature represented by the feature value to obtain a pre-processed feature value; and a processor (205) configured to implement a neural network and to determine the quality score (MOS) for the audio signal sample based on the pre-processed feature value and a set of neural network parameters for the neural network associated to the cumulative distribution function.
    • 本发明涉及一种用于确定音频信号样本的质量分数(MOS)的设备(200),该设备(200)包括:提取器(201),其被配置为从特征向量中提取特征向量 所述音频信号样本,其中所述特征向量包括多个特征值,并且其中每个特征值与所述特征向量的不同特征相关联; 预处理器,用于基于与所述特征值表征的特征相关联的累积分布函数,对所述特征向量的特征值进行预处理,得到预处理后的特征值; 以及处理器(205),被配置为实现神经网络并且基于预处理的特征值和与该累积分布相关联的神经网络的一组神经网络参数来确定音频信号样本的质量得分(MOS) 功能
    • 95. 发明申请
    • AUDIO PROCESSING WITH NEURAL NETWORKS
    • 音频处理与神经网络
    • WO2017196929A1
    • 2017-11-16
    • PCT/US2017/031888
    • 2017-05-10
    • GOOGLE LLC
    • ROBLEK, DominikSHARIFI, Matthew
    • G10L25/30
    • G06N3/08G06F3/16G06N3/049G06N3/084G10L25/30
    • Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio processing using neural networks. One of the systems includes multiple neural network layers, wherein the neural network system is configured to receive time domain features of an audio sample and to process the time domain features to generate a neural network output for the audio sample, the plurality of neural network layers comprising: a frequency-transform (F-T) layer that is configured to apply a transformation defined by a set of F-T layer parameters that transforms a window of time domain features into frequency domain features; and one or more other neural network layers having respective layer parameters, wherein the one or more neural network layers are configured to process frequency domain features to generate a neural network output.
    • 包括在计算机存储介质上编码的计算机程序的方法,系统和装置,用于使用神经网络进行音频​​处理。 其中一个系统包括多个神经网络层,其中神经网络系统被配置为接收音频样本的时域特征并且处理时域特征以生成音频样本的神经网络输出,多个神经网络层 包括:频率转换(FT)层,被配置为应用由一组FT层参数定义的变换,所述FT层参数将时域特征的窗变换成频域特征; 以及具有相应层参数的一个或多个其他神经网络层,其中所述一个或多个神经网络层被配置为处理频域特征以生成神经网络输出。
    • 97. 发明申请
    • NEURAL NETWORK VOICE ACTIVITY DETECTION EMPLOYING RUNNING RANGE NORMALIZATION
    • 神经网络语音活动检测运行范围正常化
    • WO2016049611A1
    • 2016-03-31
    • PCT/US2015/052519
    • 2015-09-26
    • CYPHER, LLC
    • VICKERS, Earl
    • G10L15/16G10L25/27G10L25/78
    • G10L21/0264G10L21/0224G10L25/30G10L25/60G10L25/78G10L25/84G10L2015/0636
    • A "running range normalization" method includes computing running estimates of the range of values of features useful for voice activity detection (VAD) and normalizing the features by mapping them to a desired range. Running range normalization includes computation of running estimates of the minimum and maximum values of VAD features and normalizing the feature values by mapping the original range to a desired range. Smoothing coefficients are optionally selected to directionally bias a rate of change of at least one of the running estimates of the minimum and maximum values. The normalized VAD feature parameters are used to train a machine learning algorithm to detect voice activity and to use the trained machine learning algorithm to isolate or enhance the speech component of the audio data.
    • “运行范围归一化”方法包括计算对语音活动检测(VAD)有用的特征值的范围的运行估计,并且通过将它们映射到期望的范围来对特征进行归一化。 运行范围归一化包括计算VAD特征的最小值和最大值的运行估计值,并通过将原始范围映射到所需范围来对特征值进行归一化。 可选地选择平滑系数来定向地偏置最小值和最大值的运行估计中的至少一个的变化率。 归一化VAD特征参数用于训练机器学习算法以检测语音活动,并使用经过训练的机器学习算法来隔离或增强音频数据的语音分量。
    • 98. 发明申请
    • SYSTEMS AND METHODS FOR RESTORATION OF SPEECH COMPONENTS
    • 用于恢复语音组件的系统和方法
    • WO2016040885A1
    • 2016-03-17
    • PCT/US2015/049816
    • 2015-09-11
    • AUDIENCE, INC.
    • AVENDANO, CarlosWOODRUFF, John
    • G10L21/02
    • G10L21/02G10L21/0208G10L21/038G10L25/30
    • A method for restoring distorted speech components of an audio signal distorted by a noise reduction or a noise cancellation includes determining distorted frequency regions and undistorted frequency regions in the audio signal. The distorted frequency regions include regions of the audio signal in which a speech distortion is present. Iterations are performed using a model to refine predictions of the audio signal at distorted frequency regions. The model is configured to modify the audio signal and may include deep neural network trained using spectral envelopes of clean or undamaged audio signals. Before each iteration, the audio signal at the undistorted frequency regions is restored to values of the audio signal prior to the first iteration; while the audio signal at distorted frequency regions is refined starting from zero at the first iteration. Iterations are ended when discrepancies of audio signal at undistorted frequency regions meet pre-defined criteria.
    • 用于恢复由噪声降低或噪声消除失真的音频信号的失真语音分量的方法包括确定音频信号中的失真频率区域和未失真的频率区域。 失真的频率区域包括其中存在语音失真的音频信号的区域。 使用模型执行迭代,以改善在失真频率区域的音频信号的预测。 该模型被配置为修改音频信号,并且可以包括使用干净或未损坏音频信号的频谱包络训练的深层神经网络。 在每次迭代之前,未失真频率区域处的音频信号在第一次迭代之前恢复为音频信号的值; 而失真频率区域的音频信号在第一次迭代时从零开始精细化。 当未失真频率区域的音频信号的差异符合预定义的标准时,迭代结束。