专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2020018212A1 EYES-OFF TRAINING FOR AUTOMATIC SPEECH RECOGNITION 审中-公开
公开(公告)号：WO2020018212A1
公开(公告)日：2020-01-23
申请号：PCT/US2019/037585
申请日：2019-06-18
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： MALHOTRA, Hemant , CHANG, Shuangyu , FATEHPURIA, Pradip Kumar
IPC分类号： G10L15/06 , G10L15/26
摘要： A method for eyes-off training of a dictation system includes translating an audio signal featuring speech audio of a speaker into an initial recognized text using a previously-trained general language model. The initial recognized text is provided to the speaker for error correction. The audio signal is re-translated into an updated recognized text using a specialized language model biased to recognize words included in the corrected text. The general language model is retrained in an "eyes-off" manner, based on the audio signal and the updated recognized text.

2. 发明申请

WO2016144988A1 TOKEN-LEVEL INTERPOLATION FOR CLASS-BASED LANGUAGE MODELS 审中-公开
标题翻译：基于类的语言模型的程度插值
公开(公告)号：WO2016144988A1
公开(公告)日：2016-09-15
申请号：PCT/US2016/021416
申请日：2016-03-09
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： LEVIT, Michael , PARTHASARATHY, Sarangarajan , STOLCKE, Andreas , CHANG, Shuangyu
IPC分类号： G10L15/06 , G10L15/197 , G10L15/18
CPC分类号： G10L15/183 , G10L15/063 , G10L15/1815 , G10L15/197
摘要： Optimized language models are provided for in-domain applications through an iterative, joint-modeling approach that interpolates a language model (LM) from a number of component LMs according to interpolation weights optimized for a target domain. The component LMs may include class-based LMs, and the interpolation may be context-specific or context-independent. Through iterative processes, the component LMs may be interpolated and used to express training material as alternative representations or parses of tokens. Posterior probabilities may be determined for these parses and used for determining new (or updated) interpolation weights for the LM components, such that a combination or interpolation of component LMs is further optimized for the domain. The component LMs may be merged, according to the optimized weights, into a single, combined LM, for deployment in an application scenario.
摘要翻译：通过迭代的联合建模方法为域内应用程序提供优化的语言模型，根据针对目标域优化的内插权重，从多个组件LM内插语言模型（LM）。组件LM可以包括基于类的LM，并且内插可以是上下文特定的或与上下文无关的。通过迭代过程，组件LM可以被内插并用于表示训练材料作为令牌的替代表示或解析。可以为这些解析确定后验概率，并且用于确定LM分量的新（或更新）插值权重，使得针对域进一步优化组件LM的组合或插值。根据优化的权重，组件LM可以被合并到单个组合的LM中，用于在应用场景中部署。

3. 发明申请

WO2015142769A1 INCREMENTAL UTTERANCE DECODER COMBINATION FOR EFFICIENT AND ACCURATE DECODING 审中-公开
标题翻译：增强UTTERANCE解码器组合有效和准确的解码
公开(公告)号：WO2015142769A1
公开(公告)日：2015-09-24
申请号：PCT/US2015/020849
申请日：2015-03-17
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： CHANG, Shuangyu , LEVIT, Michael , LAHIRI, Abhik , OGUZ, Barlas , DUMOULIN, Benoit
IPC分类号： G10L15/32
CPC分类号： G10L15/32 , G10L15/063 , G10L15/14 , G10L19/005
摘要： An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops. Accordingly, a decoded utterance with accuracy approaching the maximum for the series is obtained without decoding the spoken utterance using all utterance decoders in the series, thereby minimizing resource usage.
摘要翻译：增量语音识别系统。只有当附加话语解码器可能对组合结果增加显着的益处时，增量语音识别系统才会使用附加话音解码器递增地解码语音话语。可用的话语解码器是基于准确性，性能，多样性等因素进行排序的。识别管理引擎通过一系列话音解码器来协调语音发音的解码，组合解码的话语，并确定附加处理是否可能显着改善识别结果。如果是这样，识别管理引擎接合下一个话音解码器，并且该周期继续。如果精度无法显着提高，结果被接受，解码停止。因此，在使用系列中的所有话语解码器对语音发音进行解码的情况下，获得具有接近该系列的最大值的精确解码语音，从而最小化资源使用。

4. 发明申请

WO2022235391A1 SCALABLE ENTITIES AND PATTERNS MINING PIPELINE TO IMPROVE AUTOMATIC SPEECH RECOGNITION 审中-公开
公开(公告)号：WO2022235391A1
公开(公告)日：2022-11-10
申请号：PCT/US2022/024144
申请日：2022-04-09
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： GUPTA, Ankur , GUHA, Satarupa , MEHTA, Rupeshkumar Rasiklal , ALPHONSO, Issac John , ANASTASAKOS, Anastasios , CHANG, Shuangyu
IPC分类号： G10L15/19 , G06F40/295 , G10L15/197 , G10L15/32
摘要： A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.

5. 发明申请

WO2021225894A1 MICROSEGMENT SECURE SPEECH TRANSCRIPTION 审中-公开
公开(公告)号：WO2021225894A1
公开(公告)日：2021-11-11
申请号：PCT/US2021/030208
申请日：2021-04-30
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： MALHOTRA, Hemant , HUANG, Xuedong , JIANG, Li , GARCIA DOS SANTOS, Ivo Jose , LI, Dong , CHANG, Shuangyu
IPC分类号： G10L15/06 , G06F21/62
摘要： Embodiments are provided for securing data access to machine learning training data at a plurality of distributed computing devices. Electronic content including original data that corresponds to a preferred data security level is divided into a plurality of microsegments. The plurality of microsegments is restrictively distributed to a plurality of computing devices which apply transcription labels to the plurality of microsegments. The labeled microsegments are reconstructed into training data which is then used to train a machine learning model while facilitating an improvement in data security of the original data included with the training data from the reconstructed microsegments.

6. 发明申请

WO2016183110A1 DISCRIMINATIVE DATA SELECTION FOR LANGUAGE MODELING 审中-公开
标题翻译：语言建模的辨别数据选择
公开(公告)号：WO2016183110A1
公开(公告)日：2016-11-17
申请号：PCT/US2016/031690
申请日：2016-05-11
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： LEVIT, Michael , CHANG, Shuangyu , DUMOULIN, Benoit
IPC分类号： G10L15/19
CPC分类号： G10L15/063 , G10L15/10 , G10L15/14 , G10L15/18 , G10L15/19 , G10L2015/0633 , G10L2015/0635
摘要： A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.
摘要翻译：用于语言建模的计算机系统可以从一个或多个信息源收集训练数据，生成包含转录语言文本的口语语料库，并生成包含打字文本的类型语料库。计算机系统可以从口语语料库导出特征向量，分析类型语料库以确定表示类型文本项目的特征向量，并且通过对打字语料库进行过滤来生成不可描述的语料库，以去除由特征向量表示的每个类型文本项目，在从口语语料库导出的特征向量的相似阈值内。计算机系统可以从不可描述的语料库导出特征向量，并且训练分类器，以基于从口语语料库导出的特征向量和从不可描述的语料库导出的特征向量来执行用于语言建模的区别性数据选择。

7. 发明申请

WO2018208468A1 INTENT BASED SPEECH RECOGNITION PRIMING 审中-公开
公开(公告)号：WO2018208468A1
公开(公告)日：2018-11-15
申请号：PCT/US2018/028724
申请日：2018-04-21
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： VARADHARAJAN, Padma , CHANG, Shuangyu , SHAHID, Khuram , DONMEZ EDIZ, Meryem Pinar , AGARWAL, Nitin
IPC分类号： G10L15/32 , G10L15/19 , G10L15/22
CPC分类号： G10L15/19 , G10L15/1815 , G10L15/265 , G10L15/32 , G10L2015/228
摘要： A method for priming an extensible speech recognition system comprises receiving audio language input from a user. The method also comprises receiving an indication that the audio language input is associated with a first language-based intelligent agent. The first language-based intelligent agent is associated with a first grammar set that is specific to the first language-based intelligent agent. Additionally, the method comprises matching one or more spoken words or phrases within the audio language input to text-based words or phrases within a general grammar set associated with a speech recognition system and the first grammar set. The first grammar set is associated with a higher match bias than the general grammar set, such that the speech recognition system is more likely to match the one or more spoken words or phrases to the text-based words or phrases within the first grammar set.

8. 发明申请

WO2015171671A1 CONTEXT SPECIFIC LANGUAGE MODEL SCALE FACTORS 审中-公开
标题翻译：语境特定语言模型规模因子
公开(公告)号：WO2015171671A1
公开(公告)日：2015-11-12
申请号：PCT/US2015/029334
申请日：2015-05-06
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： LEVIT, Michael , CHANG, Shuangyu , HUANG, Zhiheng
IPC分类号： G10L15/183
CPC分类号： G10L15/18 , G10L15/063 , G10L15/14 , G10L15/183
摘要： The customization of recognition of speech utilizing context-specific language model scale factors is provided. Training audio may be received from a source in a training phase. The received training audio may be recognized utilizing acoustic and language models being combined utilizing static scale factors. A comparison may then be made of the recognition results to a transcription of the training audio. The recognition results may include one or more hypotheses for recognizing speech. Context specific scale factors may then be generated based on the comparison. The context specific scale factors may then be applied for use in the speech recognition of audio signals in an application phase.
摘要翻译：提供了利用语境特定语言模型比例因子对语音识别的定制。在培训阶段可以从源头接收训练音频。可以使用静态比例因子组合的声学和语言模型来识别所接收的训练音频。然后可以将识别结果与训练音频的转录进行比较。识别结果可以包括用于识别语音的一个或多个假设。然后可以基于比较来生成上下文特定比例因子。然后可以将上下文特定比例因子应用于在应用阶段中的音频信号的语音识别中。

9. 发明申请

WO2015148333A1 FLEXIBLE SCHEMA FOR LANGUAGE MODEL CUSTOMIZATION 审中-公开
标题翻译：用于语言模型自定义的灵活图
公开(公告)号：WO2015148333A1
公开(公告)日：2015-10-01
申请号：PCT/US2015/021921
申请日：2015-03-23
申请人： MICROSOFT TECHNOLOGY LICENSING, LLC
发明人： LEVIT, Michael , GUELMAN, Hernan , CHANG, Shuangyu , PARTHASARATHY, Sarangarajan , DUMOULIN, Benoit
IPC分类号： G10L15/30 , G10L15/183
CPC分类号： G10L15/183 , G06F17/2755 , G06F17/2785 , G10L15/30
摘要： The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.
摘要翻译：提供了用于语音识别的语言建模组件的定制。语言建模组件的列表可以由计算设备提供。然后，可以向识别服务提供商发送提示，以从列表中组合多个语言建模组件。提示可能基于许多不同的域。然后可以从识别服务提供商接收基于提示的语言建模组件的定制组合。

10. 发明公开

EP3966806A1 MEETING-ADAPTED LANGUAGE MODEL FOR SPEECH RECOGNITION 有权
公开(公告)号：EP3966806A1
公开(公告)日：2022-03-16
申请号：EP20720229.2
申请日：2020-03-25
申请人： Microsoft Technology Licensing, LLC
发明人： AL BAWAB, Ziad , DESAI, Anand U. , CHANG, Shuangyu , AGARWAL, Amit K. , ROMOCSA, Zoltan , BASOGLU, Christopher H. , WOHLGEMUTH, Nathan E.
IPC分类号： G10L15/06 , G10L15/26 , G06Q10/10 , H04L12/18 , H04M3/42 , H04M3/56 , H04N7/15

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式