专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2017218243A3 INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING SYSTEM 审中-公开
公开(公告)号：WO2017218243A3
公开(公告)日：2017-12-21
申请号：PCT/US2017/036241
申请日：2017-06-07
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： ZHAO, Pei , YAO, Kaisheng , LEUNG, Max , YAN, Bo , LUAN, Jian , SHI, Yu , MA, Malone , HWANG, Mei-Yuh
IPC分类号： G10L25/63 , G06F17/27 , G10L15/26
摘要： An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

2. 发明申请

WO2017172658A1 SPEECH RECOGNITION AND TEXT-TO-SPEECH LEARNING SYSTEM 审中-公开
标题翻译：语音识别和文字转语音学习系统
公开(公告)号：WO2017172658A1
公开(公告)日：2017-10-05
申请号：PCT/US2017/024388
申请日：2017-03-28
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： ZHAO, Pei , YAO, Kaisheng , LEUNG, Max , YAN, Bo
IPC分类号： G10L13/08 , G10L15/07
CPC分类号： G10L13/10 , G10L13/08 , G10L13/086 , G10L15/063 , G10L15/07
摘要： An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.
摘要翻译：示例文本到语音学习系统执行用于生成发音序列转换模型的方法。该方法包括从训练对的语音输入生成第一发音序列并且从训练对的文本输入生成第二发音序列。该方法还包括确定第一发音序列和第二发音序列之间的发音序列差异; 并基于发音顺序差异产生发音顺序转换模型。示例语音识别学习系统执行用于生成发音序列转换模型的方法。该方法包括从语音输入中提取音频信号向量，并将音频信号转换模型应用于音频信号向量以生成转换后的音频信号向量。该方法还包括基于转换后的音频信号矢量来适配声学模型以生成适应的声学模型。

3. 发明申请

WO2016053822A1 NATURAL MOTION-BASED CONTROL VIA WEARABLE AND MOBILE DEVICES 审中-公开
标题翻译：基于自动运动的控制通过可移动和移动设备
公开(公告)号：WO2016053822A1
公开(公告)日：2016-04-07
申请号：PCT/US2015/052542
申请日：2015-09-28
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： WANG, Jiaping , LI, Yujia , HUANG, Xuedong , WU, Lingfeng , XIONG, Wei , YAO, Kaisheng , ZWEIG, Geoffrey
IPC分类号： G06F3/01 , G06F3/0346 , H04M1/725
CPC分类号： G06F3/011 , G06F1/163 , G06F3/014 , G06F3/017 , G06F3/0346 , H04M1/7253 , H04M2250/12
摘要： A "Natural Motion Controller" identifies various motions of one or more parts of a user's body to interact with electronic devices, thereby enabling various natural user interface (NUI) scenarios. The Natural Motion Controller constructs composite motion recognition windows by concatenating an adjustable number of sequential periods of inertial sensor data received from a plurality of separate sets of inertial sensors. Each of these separate sets of inertial sensors are coupled to, or otherwise provide sensor data relating to, a separate user worn, carried, or held mobile computing device. Each composite motion recognition window is then passed to a motion recognition model trained by one or more machine-based deep learning processes. This motion recognition model is then applied to the composite motion recognition windows to identify a sequence of one or more predefined motions. Identified motions are then used as the basis for triggering execution of one or more application commands.
摘要翻译： “自然运动控制器”识别用户身体的一个或多个部分与电子设备交互的各种运动，从而实现各种自然用户界面（NUI）场景。自然运动控制器通过连接从多个独立的惯性传感器组接收的可调节数量的惯性传感器数据的连续周期来构建复合运动识别窗口。这些单独的惯性传感器组中的每一个耦合到或者提供与单独的用户佩戴，携带或保持的移动计算设备有关的传感器数据。然后将每个复合运动识别窗口传递到由一个或多个基于机器的深度学习过程训练的运动识别模型。然后将该运动识别模型应用于复合运动识别窗口以识别一个或多个预定运动的序列。然后将识别的运动用作触发一个或多个应用程序命令的执行的基础。

4. 发明申请

WO2017218243A2 INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING SYSTEM 审中-公开
标题翻译：特征识别和情感语音学习系统
公开(公告)号：WO2017218243A2
公开(公告)日：2017-12-21
申请号：PCT/US2017/036241
申请日：2017-06-07
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： ZHAO, Pei , YAO, Kaisheng , LEUNG, Max , YAN, Bo , LUAN, Jian , SHI, Yu , MA, Malone , HWANG, Mei-Yuh
IPC分类号： G10L25/63 , G06F17/27 , G10L15/26
CPC分类号： G10L25/63 , G06F17/2785 , G06N3/0445 , G06N7/005 , G10L15/265
摘要： An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
摘要翻译：示例意图识别系统包括存储指令的处理器和存储器。这些指令使处理器接收包括说出的单词的语音输入。指令使处理器基于语音输入生成文本结果并基于语音输入生成声学特征注释。该指令还使得处理器将意图模型应用于文本结果和声学特征注释以基于语音输入来识别意图。用于适应情绪文本到语音模型的示例系统包括处理器和存储器。存储器存储使处理器接收包括语音输入的训练样本并接收包括与语音输入相关联的情绪信息的标签数据的指令。该指令还使得处理器从训练示例中提取音频信号矢量，并且基于音频信号矢量和标记数据生成情绪适应的语音字体模型。

5. 发明申请

WO2016165120A1 DEEP NEURAL SUPPORT VECTOR MACHINES 审中-公开
标题翻译：深层神经支持矢量机
公开(公告)号：WO2016165120A1
公开(公告)日：2016-10-20
申请号：PCT/CN2015/076857
申请日：2015-04-17
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC , ZHANG, Shixiong , LIU, Chaojun , YAO, Kaisheng , GONG, Yifan
发明人： ZHANG, Shixiong , LIU, Chaojun , YAO, Kaisheng , GONG, Yifan
IPC分类号： G10L15/02
CPC分类号： G10L15/16 , G06N3/02 , G06N99/005 , G10L15/187 , G10L2015/025
摘要： Aspects of the technology described herein relates to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multiclass SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.
摘要翻译：本文描述的技术的方面涉及一种新型的深神经网络（DNN）。新DNN在本文中被描述为深神经支持向量机（DNSVM）。传统的DNN使用顶层和下层进行训练的多项Logistic回归（softmax激活）。新的DNN代替使用支持向量机（SVM）作为一个或多个层，包括顶层。本文描述的技术可以使用两种训练算法中的一种来训练DNSVM以在最大裕度标准中学习SVM和DNN的参数。第一种训练方法是一个框架级的训练。在帧级训练中，新模型与具有DNN特征的多类SVM相关。第二种训练方法是序列级训练。序列级训练与具有DNN特征和HMM状态转换特征的结构化SVM相关。

6. 发明申请

WO2015191968A1 HYPER-STRUCTURE RECURRENT NEURAL NETWORKS FOR TEXT-TO-SPEECH 审中-公开
标题翻译：超文本复现神经网络对文本语音的影响
公开(公告)号：WO2015191968A1
公开(公告)日：2015-12-17
申请号：PCT/US2015/035504
申请日：2015-06-12
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： ZHAO, Pei , LEUNG, Max , YAO, Kaisheng , YAN, Bo , ZHAO, Sheng , ALLEVA, Fileno A.
IPC分类号： G10L13/10
CPC分类号： G10L13/08 , G06N3/02 , G06N3/0445 , G10L13/10
摘要： The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.
摘要翻译：该技术涉及使用循环神经网络（RNN）将文本转换为语言。循环神经网络可以被实现为用于确定文本属性的多个模块。在实施例中，可以使用语音RNN模块，字母对声音RNN模块，语言韵律标签器RNN模块和上下文感知和语义挖掘RNN模块。来自RNN模块的属性由超结构RNN模块处理，该RNN模块基于其他RNN模块的输出来确定输入文本的语音属性。超结构RNN模块可以生成能够由语音合成器转换成可听话音的生成序列。生成序列还可以在合成为可听话音之前由全局优化模块进行优化。

7. 发明申请

WO2015191651A1 ADVANCED RECURRENT NEURAL NETWORK BASED LETTER-TO-SOUND 审中-公开
标题翻译：先进的基于神经网络的语音信号
公开(公告)号：WO2015191651A1
公开(公告)日：2015-12-17
申请号：PCT/US2015/034993
申请日：2015-06-10
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： ZHAO, Pei , YAO, Kaisheng , LEUNG, Max , HWANG, Mei-Yuh , ZHAO, Sheng , YAN, Bo , ZWEIG, Geoffrey , ALLEVA, Fileno A.
IPC分类号： G10L13/04
CPC分类号： G10L13/08 , G06N3/02 , G06N3/0445 , G10L13/04
摘要： The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text.
摘要翻译：该技术涉及利用循环神经网络（RNN）执行字母到声音转换。 RNN可以被实现为用于字母到声音转换的RNN模块。 RNN模块接收文本输入并将文本转换为相应的音素。在确定相应的音素时，RNN模块可以分析文本的字母和正在分析的文本周围的字母。 RNN模块还可以以相反的顺序分析文本的字母。 RNN模块还可以接收关于输入文本的上下文信息。然后，字母对声音的转换也可以基于所接收的上下文信息。所确定的音素可用于从输入文本生成合成语音。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式