专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2015134294A1 LOW-FOOTPRINT ADAPTATION AND PERSONALIZATION FOR A DEEP NEURAL NETWORK 审中-公开
标题翻译：用于深层神经网络的低自适应和个性化
公开(公告)号：WO2015134294A1
公开(公告)日：2015-09-11
申请号：PCT/US2015/017872
申请日：2015-02-27
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： XUE, Jian , LI, Jinyu , YU, Dong , SELTZER, Michael L. , GONG, Yifan
IPC分类号： G10L15/07 , G10L15/16
CPC分类号： G10L15/16 , G06N3/082 , G10L15/075
摘要： The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
摘要翻译：提供了一种用于自动语音识别的深层神经网络（DNN）模型的适应和个性化。可以在诸如语音搜索或短消息听写的ASR任务中接收包括用于一个或多个扬声器的语音特征的话语。然后可以将分解方法应用于DNN模型中的原始矩阵。响应于应用分解方法，原始矩阵可以被转换成小于原始矩阵的多个新矩阵。然后可以将正方形矩阵添加到新矩阵。然后可以将扬声器特定参数存储在方阵中。然后可以通过更新方阵来适应DNN模型。该过程可以应用于DNN模型中的所有原始矩阵。适应的DNN模型可以包括与原始DNN模型中接收的参数相比减少的参数数量。

2. 发明申请

WO00051105A1 SUPERVISED ADAPTATION USING CORRECTIVE N-BEST DECODING 审中-公开
标题翻译：使用正确的N-BEST解码进行监控
公开(公告)号：WO00051105A1
公开(公告)日：2000-08-31
申请号：PCT/US2000/001838
申请日：2000-01-25
IPC分类号： G10L15/06 , G10L15/14
CPC分类号： G10L15/075 , G10L2015/0635
摘要： Supervised adaptation speech is supplied to the recognizer (10) and the recognizer generates the N-best transcriptions of the adaptation speech (14). These transcriptions include the one transcription known to be correct, based on a prior knowledge of the adaptation speech, and the remaining transcriptions known to be incorrect. The system applies weights to each transcription (16): a positive weight to the correct transcription and negative weights to the incorrect transcriptions. These weights have the effect of moving the incorrect transcriptions away from the correct one, rendering the recognition system more discriminative for the new speakers speaking characteristics. Weights applied to the incorrect solutions are based on the respective likelihood scores generated by the recognizer. Preferably, the sum of all weights (positive and negative) are a positive number. This ensures that the system will converge.
摘要翻译：受监督的适应语音被提供给识别器（10），并且识别器生成适应语音（14）的N个最佳转录。这些转录包括基于适应言语的先前知识而已知是正确的一个转录，以及已知不正确的剩余转录。系统对每个转录（16）应用权重：对正确转录的正负重和不正确转录的负权重。这些权重具有将错误的转录从正确的转录移开的效果，使得识别系统对于新的说话者的特征更具歧视性。应用于不正确解的权重是基于识别器产生的各自的可能性得分。优选地，所有权重（正和负）的和是正数。这样可以确保系统收敛。

3. 发明申请

WO2017099936A1 SYSTEM AND METHODS FOR ADAPTING NEURAL NETWORK ACOUSTIC MODELS 审中-公开
标题翻译：适应神经网络声学模型的系统和方法
公开(公告)号：WO2017099936A1
公开(公告)日：2017-06-15
申请号：PCT/US2016/061326
申请日：2016-11-10
申请人： NUANCE COMMUNICATIONS, INC.
发明人： ZHAN, Puming , LI, Xinwei
IPC分类号： G10L15/16 , G10L15/07
CPC分类号： G10L15/075 , G10L15/07 , G10L15/14 , G10L15/16 , G10L17/02
摘要： Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
摘要翻译：包括使用至少一个计算机硬件处理器来执行以下操作：调节训练的神经网络声学模型的技术：生成说话者的初始说话者信息值; 从对应于讲话者讲的第一话语的第一讲话数据中产生第一讲话内容值; 使用训练的神经网络声学模型处理第一语音内容值和初始说话者信息值; 至少部分地基于所述处理的结果，使用自动语音识别来识别所述第一话语; 使用第一语音数据和初始说话人信息值和/或用于生成初始说话人信息值的信息中的至少一个生成更新的说话人信息值; 以及至少部分基于更新后的说话者信息值来识别说话者说出的第二话语。

4. 发明申请

WO2014133525A1 SERVER-SIDE ASR ADAPTATION TO SPEAKER, DEVICE AND NOISE CONDITION VIA NON-ASR AUDIO TRANSMISSION 审中-公开
标题翻译：通过非ASR音频传输的服务器侧ASR适应于扬声器，设备和噪音条件
公开(公告)号：WO2014133525A1
公开(公告)日：2014-09-04
申请号：PCT/US2013/028288
申请日：2013-02-28
申请人： NUANCE COMMUNICATION, INC.
发明人： WILLETT, Daniel , DAHAN, Jean-guy, E. , GANONG, William, F. , WU, Jianxiong
IPC分类号： G10L21/02
CPC分类号： G10L15/075 , G06F3/167 , G10L15/07 , G10L15/14 , G10L15/1815 , G10L15/20 , G10L15/22 , G10L15/30 , G10L25/84 , G10L2015/223 , G10L2015/226
摘要： A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR- adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel- specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.
摘要翻译：移动设备适用于自动语音识别（ASR）。用于与用户交互的用户界面包括用于从用户获得用于自动语音识别的语音输入的输入麦克风，以及用于基于对应于语音输入的ASR结果向用户输出系统的输出接口。本地控制器从输入麦克风获取非ASR音频的样本，用于ASR适应信道特定的ASR特性，然后向远程ASR服务器提供非ASR音频的表示，用于服务器端适配信道 - 具体ASR特性，然后提供从输入麦克风到远程ASR服务器的未知ASR语音输入的表示，以确定与未知ASR语音输入相对应的ASR结果，然后将系统输出提供给输出接口。

5. 发明申请

WO2017108142A1 LINGUISTIC MODEL SELECTION FOR ADAPTIVE AUTOMATIC SPEECH RECOGNITION 审中-公开
标题翻译：自适应语音识别的语言模型选择
公开(公告)号：WO2017108142A1
公开(公告)日：2017-06-29
申请号：PCT/EP2015/081243
申请日：2015-12-24
申请人： INTEL CORPORATION , PEREZ, Guillermo
发明人： PEREZ, Guillermo , SHELLEF, Eric Ariel , SHILON, Reshef , GRAFF, Peter , ENG, Jonathan , LUCAS, Juan Manuel , VAN DEN BERG, Martin Henk
IPC分类号： G10L15/24
CPC分类号： G06F17/30032 , G06K2009/00939 , G10L15/075 , G10L15/183 , G10L15/24 , G10L17/02 , G10L17/22
摘要： The present disclosure describes dynamically adjusting linguistic models for automatic speech recognition based on biometric information to produce a more reliable speech recognition experience. Embodiments include receiving a speech signal, receiving a biometric signal from a biometric sensor implemented at least partially in hardware, determining a linguistic model based on the biometric signal, and processing the speech signal for speech recognition using the linguistic model based on the biometric signal.
摘要翻译：本公开描述了基于生物信息动态地调整用于自动语音识别的语言模型以产生更可靠的语音识别体验。实施例包括：接收语音信号;从至少部分地以硬件实现的生物测量传感器接收生物测量信号;基于生物测量信号确定语言模型;以及基于生物测量信号使用语言模型处理语音信号以用于语音识别。

6. 发明申请

WO2016122942A1 DYNAMIC INFERENCE OF VOICE COMMAND FOR SOFTWARE OPERATION FROM HELP INFORMATION 审中-公开
标题翻译：动态信息对软件操作的动态声明
公开(公告)号：WO2016122942A1
公开(公告)日：2016-08-04
申请号：PCT/US2016/014118
申请日：2016-01-20
申请人： GOOGLE TECHNOLOGY HOLDINGS LLC
发明人： AGRAWAL, Amit Kumar , ESSICK, Raymond B. , ROUT, Satyabrata
IPC分类号： G06F3/16
CPC分类号： G10L15/075 , G06F3/04842 , G06F3/0488 , G06F3/167 , G06F2203/0381 , G10L21/06
摘要： In an electronic device (100), a method includes analyzing help information (160, 1002) associated with a software application (214) to identify a sequence of manipulations of viewable elements associated with an instance of an operation by the software application. The method further includes generating a voice command set (1802) based on the sequence of manipulations of viewable elements and storing the voice command set. The method further includes receiving voice input (162) from a user, determining the voice input represents a voice command of the voice command set, and performing an emulated manipulation sequence (370) of viewable elements based on the voice command to actuate an instance of the operation by the software application, the emulated manipulation sequence based on the sequence of manipulations of viewable elements.
摘要翻译：在电子设备（100）中，一种方法包括分析与软件应用程序（214）相关联的帮助信息（160,1002），以识别与软件应用程序的操作实例相关联的可视元素的操作序列。该方法还包括基于可视元素的操作序列生成语音命令集（1802）并存储语音命令集。该方法还包括从用户接收语音输入（162），确定语音输入表示语音命令集的语音命令，并且基于语音命令执行可视元素的仿真操作序列（370），以激活语音输入软件应用程序的操作，基于可视元素操作顺序的仿真操作序列。

7. 发明申请

WO2015079885A1 統計的音響モデルの適応方法、統計的音響モデルの適応に適した音響モデルの学習方法、ディープ・ニューラル・ネットワークを構築するためのパラメータを記憶した記憶媒体、及び統計的音響モデルの適応を行なうためのコンピュータプログラム审中-公开
标题翻译：统计 - 声学模型适应方法，适用于统计 - 声学模型适应的声学模型学习方法，用于建立深层神经网络的参数存储介质和用于适应统计声学模型的计算机程序
公开(公告)号：WO2015079885A1
公开(公告)日：2015-06-04
申请号：PCT/JP2014/079490
申请日：2014-11-06
申请人：独立行政法人情報通信研究機構
发明人：松田　繁樹 , ルー・シュガン
IPC分类号： G10L15/07 , G06N3/00 , G06N3/08 , G10L15/16
CPC分类号： G10L15/16 , G06N3/04 , G06N3/0454 , G06N3/08 , G06N3/082 , G10L15/063 , G10L15/075
摘要：【課題】特定の条件の学習データを用いてＤＮＮを用いた音響モデルの適応化を効率的に行なえ、精度も高められる統計的音響モデルの適応方法を提供する。【解決手段】ＤＮＮを用いた音響モデルの話者適応方法において、第１の記憶装置に、異なる話者の発話データ９０～９８を別々に記憶するステップと、話者別の隠れ層モジュール１１２～１２０を準備するステップと、発話データ９０～９８を切替えて選択しながら、特定レイヤ１１０を、選択された発話データに対応する隠れ層モジュール１１２～１２０で動的に置換しながらＤＮＮ８０の全てのレイヤ４２，４４，１１０，４８，５０，５２，５４について準備的学習を行なうステップと、準備的学習が完了したＤＮＮの特定レイヤ１１０を初期隠れ層で置換するステップと、初期隠れ層以外のレイヤのパラメータを固定して、特定話者の音声データでＤＮＮの学習を行なうステップとを含む。
摘要翻译： [问题]提供一种使用具有特定条件的学习数据可以用于使用深层神经网络（DNN）来有效地适应声学模型的统计 - 声学模型适应方法，并且还可以提高其精度。 [解决方案]使用DNN的声学模型的扬声器适配方法包括：将不同扬声器的话音数据（90-98）分别存储在第一存储设备中的步骤; 其中制备用于分离扬声器的隐层模块（112-120）的步骤; 在切换和选择话音数据（90-98）的同时执行DNN（80）中的所有层（42,44,110,48,50,52,54）的初步学习的步骤，并动态地替换具有对应于所选话语数据的隐层模块（112-120）的特定层（110）; 其中已经完成了初步学习之后的DNN特定层（110）被初始隐藏层替代的步骤; 以及固定除了初始隐藏层之外的层的参数的步骤，并且使用特定说话者的语音数据来执行DNN学习。

8. 发明申请

WO2018106553A1 CLOUD AND NAME OPTIMIZED SPEECH RECOGNITION 审中-公开
公开(公告)号：WO2018106553A1
公开(公告)日：2018-06-14
申请号：PCT/US2017/064391
申请日：2017-12-04
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： HESS, Hans Peter , SHIR, Oren , VERMA, Naveen Kumar
IPC分类号： G10L15/193 , G10L15/19
CPC分类号： G10L15/063 , G06F17/20 , G10L15/075 , G10L15/19 , G10L15/30 , G10L2015/0635 , H04M3/4936 , H04M7/0021
摘要： A name file service is described that optimizes speech recognition in the cloud environment. The name file service monitors changes of users associated with tenant accounts and automatically updates a name file (or dictionary of names) for generating a grammar file used by speech recognition services. The described service may be used by auto-attendant applications as one example.

9. 发明申请

WO2014133525A8 SERVER-SIDE ASR ADAPTATION TO SPEAKER, DEVICE AND NOISE CONDITION VIA NON-ASR AUDIO TRANSMISSION 审中-公开
标题翻译：通过非ASR音频传输的服务器侧ASR适应于扬声器，设备和噪音条件
公开(公告)号：WO2014133525A8
公开(公告)日：2015-09-17
申请号：PCT/US2013028288
申请日：2013-02-28
申请人： NUANCE COMMUNICATIONS INC
发明人： WILLETT DANIEL , DAHAN JEAN-GUY E , GANONG III WILLIAM F , WU JIANXIONG
IPC分类号： G10L21/02
CPC分类号： G10L15/075 , G06F3/167 , G10L15/07 , G10L15/14 , G10L15/1815 , G10L15/20 , G10L15/22 , G10L15/30 , G10L25/84 , G10L2015/223 , G10L2015/226
摘要： A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR- adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel- specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.
摘要翻译：移动设备适用于自动语音识别（ASR）。用于与用户交互的用户界面包括用于从用户获得用于自动语音识别的语音输入的输入麦克风，以及用于基于对应于语音输入的ASR结果向用户输出系统的输出接口。本地控制器从输入麦克风获取非ASR音频的样本，用于ASR适应信道特定的ASR特性，然后向远程ASR服务器提供非ASR音频的表示，用于服务器端适配信道 - 具体ASR特性，然后提供从输入麦克风到远程ASR服务器的未知ASR语音输入的表示，以确定与未知ASR语音输入相对应的ASR结果，然后将系统输出提供给输出接口。

10. 发明申请

WO2007024427A1 SYSTEM AND METHOD FOR DISTRIBUTING A SPEECH-RECOGNITION GRAMMAR 审中-公开
标题翻译：用于分发语音识别格式的系统和方法
公开(公告)号：WO2007024427A1
公开(公告)日：2007-03-01
申请号：PCT/US2006/030011
申请日：2006-08-01
申请人： CISCO TECHNOLOGY, INC. , CHESTNUT, Kevin, L. , BURTON, Joseph, B.
发明人： CHESTNUT, Kevin, L. , BURTON, Joseph, B.
IPC分类号： G10L15/06
CPC分类号： G10L15/075
摘要： A method for distributing voice-recognition grammars includes receiving match data from a first remote element. The match data includes information associated with an attempt by the remote element to match received audio information to first stored audio data. The method also includes generating a grammar entry based on the match data. The grammar entry includes second stored audio data and a word identifier associated with the second stored audio data. Additionally, the method includes transmitting the grammar entry to a second remote element.
摘要翻译：用于分发语音识别语法的方法包括从第一远程元件接收匹配数据。匹配数据包括与远程元件的尝试相关联的信息，以将接收的音频信息与第一存储的音频数据相匹配。该方法还包括基于匹配数据生成语法条目。语法条目包括第二存储的音频数据和与第二存储的音频数据相关联的字标识符。另外，该方法包括将语法条目传送到第二远程元件。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式