会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • PERMUTATION INVARIANT TRAINING FOR TALKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    • 对话者独立多人语音分离的排列不变式训练
    • WO2017200782A1
    • 2017-11-23
    • PCT/US2017/031473
    • 2017-05-06
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • YU, Dong
    • G10L21/0272
    • The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition ("ASR") in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.
    • 这里描述的技术改进了方法以使计算设备在独立于讲话者的多讲话者场景中进行自动语音识别(“ASR”)。 在一些示例中,深度学习模型的置换不变训练可以用于与讲话者无关的多讲话者场景。 在一些示例中,这些技术可以确定模型对源信号的估计和源信号之间的置换 - 考虑的分配。 在一些示例中,这些技术可以包括训练产生估计的模型以最小化置换考虑的分配的偏差。 这些技术可以实现为神经网络的结构本身,解决了标签置换问题,这阻碍了基于深度学习的语音分离技术的进步。 本文讨论的技术还可以包括源跟踪以通过混合信号的帧跟踪源自相同源的流。
    • 2. 发明申请
    • AUTOMATED PREDICTIVE MODELING AND FRAMEWORK
    • 自动预测建模与框架
    • WO2017139237A1
    • 2017-08-17
    • PCT/US2017/016759
    • 2017-02-06
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • SHAN, YingHOENS, Thomas RyanJIAO, JianWANG, HaijingYU, DongMAO, Jc
    • G06N3/04G06Q30/02
    • G06N3/08G06F17/30864G06N3/0481G06Q10/04G06Q30/0242
    • Systems and methods of a predictive framework are provided. The predictive framework comprises plural neural layers of adaptable, executable neurons. Neurons accept one or more input signals and produce an output signal that may be used by an upper-level neural layer. Input signals are received by an encoding neural layer, where there is a 1:1 correspondence between an input signal and an encoding neuron. Input signals are received at the encoding layer and processed successively by the various neural layers. An objective function utilizes the output signals of the topmost neural layer to generate predictive results for the data set according to an objective. In one embodiment, the objective is to determine the likelihood of user interaction with regard to a specific item of content in a set of search results, or the likelihood of user interaction with regard to any item of content in a set of search results.
    • 提供了预测框架的系统和方法。 预测框架包括适应性可执行神经元的多个神经层。 神经元接受一个或多个输入信号并产生一个输出信号,可以被上层神经层使用。 输入信号由编码神经层接收,其中输入信号和编码神经元之间存在1:1的对应关系。 输入信号在编码层被接收,并被各种神经层连续处理。 目标函数利用最顶层神经层的输出信号根据目标产生数据集的预测结果。 在一个实施例中,目标是确定关于一组搜索结果中的特定内容项的用户交互的可能性,或者关于一组搜索结果中的任何内容项的用户交互的可能性。 / p>

    • 3. 发明申请
    • MULTI-SPEAKER SPEECH SEPARATION
    • 多音箱语音分离
    • WO2017112466A1
    • 2017-06-29
    • PCT/US2016/066430
    • 2016-12-14
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • YU, Dong
    • G10L15/16G06N3/04G10L15/07G10L21/0272G10L17/18
    • G10L25/30G06N3/0445G10L15/07G10L15/16G10L15/197G10L15/20G10L15/22G10L15/26G10L17/18G10L21/0272G10L25/18G10L25/21G10L2015/223
    • The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.
    • 这里描述的技术使用多输出层RNN来处理包括来自多个扬声器的语音的声学信号以跟踪个体说话者的语音。 多输出层RNN具有多个输出层,每个输出层意味着跟踪一个扬声器(或噪声)并表示该扬声器(或噪声)的掩模。 每个扬声器(或噪声)的输出层可以具有相同的尺寸,并且可以针对所有输出层中的每个输出单元进行归一化。 多输出层RNN中的其余层在所有输出层之间共享。 来自前一帧的结果被用作输出层或RNN的隐藏层之一的输入,以计算当前帧的结果。 这种回传结果允许模型将来自先前帧的信息携带到未来帧以跟踪相同的说话者。
    • 5. 发明申请
    • LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK
    • 用于深层神经网络的低自适应和个性化
    • WO2015134294A1
    • 2015-09-11
    • PCT/US2015/017872
    • 2015-02-27
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • XUE, JianLI, JinyuYU, DongSELTZER, Michael L.GONG, Yifan
    • G10L15/07G10L15/16
    • G10L15/16G06N3/082G10L15/075
    • The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
    • 提供了一种用于自动语音识别的深层神经网络(DNN)模型的适应和个性化。 可以在诸如语音搜索或短消息听写的ASR任务中接收包括用于一个或多个扬声器的语音特征的话语。 然后可以将分解方法应用于DNN模型中的原始矩阵。 响应于应用分解方法,原始矩阵可以被转换成小于原始矩阵的多个新矩阵。 然后可以将正方形矩阵添加到新矩阵。 然后可以将扬声器特定参数存储在方阵中。 然后可以通过更新方阵来适应DNN模型。 该过程可以应用于DNN模型中的所有原始矩阵。 适应的DNN模型可以包括与原始DNN模型中接收的参数相比减少的参数数量。