会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • SPEECH RECOGNITION AND TEXT-TO-SPEECH LEARNING SYSTEM
    • 语音识别和文字转语音学习系统
    • WO2017172658A1
    • 2017-10-05
    • PCT/US2017/024388
    • 2017-03-28
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • ZHAO, PeiYAO, KaishengLEUNG, MaxYAN, Bo
    • G10L13/08G10L15/07
    • G10L13/10G10L13/08G10L13/086G10L15/063G10L15/07
    • An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.
    • 示例文本到语音学习系统执行用于生成发音序列转换模型的方法。 该方法包括从训练对的语音输入生成第一发音序列并且从训练对的文本输入生成第二发音序列。 该方法还包括确定第一发音序列和第二发音序列之间的发音序列差异; 并基于发音顺序差异产生发音顺序转换模型。 示例语音识别学习系统执行用于生成发音序列转换模型的方法。 该方法包括从语音输入中提取音频信号向量,并将音频信号转换模型应用于音频信号向量以生成转换后的音频信号向量。 该方法还包括基于转换后的音频信号矢量来适配声学模型以生成适应的声学模型。
    • 3. 发明申请
    • NATURAL MOTION-BASED CONTROL VIA WEARABLE AND MOBILE DEVICES
    • 基于自动运动的控制通过可移动和移动设备
    • WO2016053822A1
    • 2016-04-07
    • PCT/US2015/052542
    • 2015-09-28
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • WANG, JiapingLI, YujiaHUANG, XuedongWU, LingfengXIONG, WeiYAO, KaishengZWEIG, Geoffrey
    • G06F3/01G06F3/0346H04M1/725
    • G06F3/011G06F1/163G06F3/014G06F3/017G06F3/0346H04M1/7253H04M2250/12
    • A "Natural Motion Controller" identifies various motions of one or more parts of a user's body to interact with electronic devices, thereby enabling various natural user interface (NUI) scenarios. The Natural Motion Controller constructs composite motion recognition windows by concatenating an adjustable number of sequential periods of inertial sensor data received from a plurality of separate sets of inertial sensors. Each of these separate sets of inertial sensors are coupled to, or otherwise provide sensor data relating to, a separate user worn, carried, or held mobile computing device. Each composite motion recognition window is then passed to a motion recognition model trained by one or more machine-based deep learning processes. This motion recognition model is then applied to the composite motion recognition windows to identify a sequence of one or more predefined motions. Identified motions are then used as the basis for triggering execution of one or more application commands.
    • “自然运动控制器”识别用户身体的一个或多个部分与电子设备交互的各种运动,从而实现各种自然用户界面(NUI)场景。 自然运动控制器通过连接从多个独立的惯性传感器组接收的可调节数量的惯性传感器数据的连续周期来构建复合运动识别窗口。 这些单独的惯性传感器组中的每一个耦合到或者提供与单独的用户佩戴,携带或保持的移动计算设备有关的传感器数据。 然后将每个复合运动识别窗口传递到由一个或多个基于机器的深度学习过程训练的运动识别模型。 然后将该运动识别模型应用于复合运动识别窗口以识别一个或多个预定运动的序列。 然后将识别的运动用作触发一个或多个应用程序命令的执行的基础。
    • 4. 发明申请
    • INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING SYSTEM
    • 特征识别和情感语音学习系统
    • WO2017218243A2
    • 2017-12-21
    • PCT/US2017/036241
    • 2017-06-07
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • ZHAO, PeiYAO, KaishengLEUNG, MaxYAN, BoLUAN, JianSHI, YuMA, MaloneHWANG, Mei-Yuh
    • G10L25/63G06F17/27G10L15/26
    • G10L25/63G06F17/2785G06N3/0445G06N7/005G10L15/265
    • An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
    • 示例意图识别系统包括存储指令的处理器和存储器。 这些指令使处理器接收包括说出的单词的语音输入。 指令使处理器基于语音输入生成文本结果并基于语音输入生成声学特征注释。 该指令还使得处理器将意图模型应用于文本结果和声学特征注释以基于语音输入来识别意图。 用于适应情绪文本到语音模型的示例系统包括处理器和存储器。 存储器存储使处理器接收包括语音输入的训练样本并接收包括与语音输入相关联的情绪信息的标签数据的指令。 该指令还使得处理器从训练示例中提取音频信号矢量,并且基于音频信号矢量和标记数据生成情绪适应的语音字体模型。
    • 5. 发明申请
    • DEEP NEURAL SUPPORT VECTOR MACHINES
    • 深层神经支持矢量机
    • WO2016165120A1
    • 2016-10-20
    • PCT/CN2015/076857
    • 2015-04-17
    • MICROSOFT TECHNOLOGY LICENSING, LLCZHANG, ShixiongLIU, ChaojunYAO, KaishengGONG, Yifan
    • ZHANG, ShixiongLIU, ChaojunYAO, KaishengGONG, Yifan
    • G10L15/02
    • G10L15/16G06N3/02G06N99/005G10L15/187G10L2015/025
    • Aspects of the technology described herein relates to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multiclass SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.
    • 本文描述的技术的方面涉及一种新型的深神经网络(DNN)。 新DNN在本文中被描述为深神经支持向量机(DNSVM)。 传统的DNN使用顶层和下层进行训练的多项Logistic回归(softmax激活)。 新的DNN代替使用支持向量机(SVM)作为一个或多个层,包括顶层。 本文描述的技术可以使用两种训练算法中的一种来训练DNSVM以在最大裕度标准中学习SVM和DNN的参数。 第一种训练方法是一个框架级的训练。 在帧级训练中,新模型与具有DNN特征的多类SVM相关。 第二种训练方法是序列级训练。 序列级训练与具有DNN特征和HMM状态转换特征的结构化SVM相关。
    • 6. 发明申请
    • HYPER-STRUCTURE RECURRENT NEURAL NETWORKS FOR TEXT-TO-SPEECH
    • 超文本复现神经网络对文本语音的影响
    • WO2015191968A1
    • 2015-12-17
    • PCT/US2015/035504
    • 2015-06-12
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • ZHAO, PeiLEUNG, MaxYAO, KaishengYAN, BoZHAO, ShengALLEVA, Fileno A.
    • G10L13/10
    • G10L13/08G06N3/02G06N3/0445G10L13/10
    • The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.
    • 该技术涉及使用循环神经网络(RNN)将文本转换为语言。 循环神经网络可以被实现为用于确定文本属性的多个模块。 在实施例中,可以使用语音RNN模块,字母对声音RNN模块,语言韵律标签器RNN模块和上下文感知和语义挖掘RNN模块。 来自RNN模块的属性由超结构RNN模块处理,该RNN模块基于其他RNN模块的输出来确定输入文本的语音属性。 超结构RNN模块可以生成能够由语音合成器转换成可听话音的生成序列。 生成序列还可以在合成为可听话音之前由全局优化模块进行优化。