会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • TOKEN-LEVEL INTERPOLATION FOR CLASS-BASED LANGUAGE MODELS
    • 基于类的语言模型的程度插值
    • WO2016144988A1
    • 2016-09-15
    • PCT/US2016/021416
    • 2016-03-09
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • LEVIT, MichaelPARTHASARATHY, SarangarajanSTOLCKE, AndreasCHANG, Shuangyu
    • G10L15/06G10L15/197G10L15/18
    • G10L15/183G10L15/063G10L15/1815G10L15/197
    • Optimized language models are provided for in-domain applications through an iterative, joint-modeling approach that interpolates a language model (LM) from a number of component LMs according to interpolation weights optimized for a target domain. The component LMs may include class-based LMs, and the interpolation may be context-specific or context-independent. Through iterative processes, the component LMs may be interpolated and used to express training material as alternative representations or parses of tokens. Posterior probabilities may be determined for these parses and used for determining new (or updated) interpolation weights for the LM components, such that a combination or interpolation of component LMs is further optimized for the domain. The component LMs may be merged, according to the optimized weights, into a single, combined LM, for deployment in an application scenario.
    • 通过迭代的联合建模方法为域内应用程序提供优化的语言模型,根据针对目标域优化的内插权重,从多个组件LM内插语言模型(LM)。 组件LM可以包括基于类的LM,并且内插可以是上下文特定的或与上下文无关的。 通过迭代过程,组件LM可以被内插并用于表示训练材料作为令牌的替代表示或解析。 可以为这些解析确定后验概率,并且用于确定LM分量的新(或更新)插值权重,使得针对域进一步优化组件LM的组合或插值。 根据优化的权重,组件LM可以被合并到单个组合的LM中,用于在应用场景中部署。
    • 3. 发明申请
    • INCREMENTAL UTTERANCE DECODER COMBINATION FOR EFFICIENT AND ACCURATE DECODING
    • 增强UTTERANCE解码器组合有效和准确的解码
    • WO2015142769A1
    • 2015-09-24
    • PCT/US2015/020849
    • 2015-03-17
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • CHANG, ShuangyuLEVIT, MichaelLAHIRI, AbhikOGUZ, BarlasDUMOULIN, Benoit
    • G10L15/32
    • G10L15/32G10L15/063G10L15/14G10L19/005
    • An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops. Accordingly, a decoded utterance with accuracy approaching the maximum for the series is obtained without decoding the spoken utterance using all utterance decoders in the series, thereby minimizing resource usage.
    • 增量语音识别系统。 只有当附加话语解码器可能对组合结果增加显着的益处时,增量语音识别系统才会使用附加话音解码器递增地解码语音话语。 可用的话语解码器是基于准确性,性能,多样性等因素进行排序的。 识别管理引擎通过一系列话音解码器来协调语音发音的解码,组合解码的话语,并确定附加处理是否可能显着改善识别结果。 如果是这样,识别管理引擎接合下一个话音解码器,并且该周期继续。 如果精度无法显着提高,结果被接受,解码停止。 因此,在使用系列中的所有话语解码器对语音发音进行解码的情况下,获得具有接近该系列的最大值的精确解码语音,从而最小化资源使用。
    • 6. 发明申请
    • DISCRIMINATIVE DATA SELECTION FOR LANGUAGE MODELING
    • 语言建模的辨别数据选择
    • WO2016183110A1
    • 2016-11-17
    • PCT/US2016/031690
    • 2016-05-11
    • MICROSOFT TECHNOLOGY LICENSING, LLC
    • LEVIT, MichaelCHANG, ShuangyuDUMOULIN, Benoit
    • G10L15/19
    • G10L15/063G10L15/10G10L15/14G10L15/18G10L15/19G10L2015/0633G10L2015/0635
    • A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.
    • 用于语言建模的计算机系统可以从一个或多个信息源收集训练数据,生成包含转录语言文本的口语语料库,并生成包含打字文本的类型语料库。 计算机系统可以从口语语料库导出特征向量,分析类型语料库以确定表示类型文本项目的特征向量,并且通过对打字语料库进行过滤来生成不可描述的语料库,以去除由特征向量表示的每个类型文本项目, 在从口语语料库导出的特征向量的相似阈值内。 计算机系统可以从不可描述的语料库导出特征向量,并且训练分类器,以基于从口语语料库导出的特征向量和从不可描述的语料库导出的特征向量来执行用于语言建模的区别性数据选择。