专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US5317673A Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system 失效
标题翻译：用于语音识别系统中具有多层感知器的语音类的多个概率分布的上下文相关估计的方法和装置
公开(公告)号：US5317673A
公开(公告)日：1994-05-31
申请号：US901716
申请日：1992-06-22
申请人： Michael H. Cohen , Horacio E. Franco
发明人： Michael H. Cohen , Horacio E. Franco
IPC分类号： G10L15/14 , G10L5/06
CPC分类号： G10L15/144
摘要： In a hidden Markov model-based speech recognition system, multilayer perceptrons (MLPs) are used in context-dependent estimation of a plurality of state-dependent observation probability distributions of phonetic classes. Estimation is obtained by the Bayesian factorization of the observation likelihood in terms of posterior probabilities of phone classes assuming the context and the input speech vector. The context-dependent estimation is employed as the state-dependent observation probabilities needed as parameter input to a hidden Markov model speech processor to identify the word sequence representing the unknown speech input of input speech vectors. Within the speech processor, models are provided which employ the observation probabilities in the recognition process. The number of context-dependent nets is reduced to a single net by sharing the units of the input layer and the hidden layer and the weights connecting them in the multilayer perceptron while providing one output layer for each relevant context. Each output layer is trained as an independent network on the specific examples of the corresponding context it represents. Training may be optimized at an intermediate set of weights between the context-independent-associated weights and the context-dependent associated weights to which training would normally converge.
摘要翻译：在基于隐马尔可夫模型的语音识别系统中，多媒体感知器（MLP）用于语音类的多个状态依赖性观察概率分布的上下文相关估计。通过假设上下文和输入语音向量的电话类的后验概率的观察可能性的贝叶斯分解获得估计。采用上下文相关估计作为对隐马尔可夫模型语音处理器的参数输入所需的状态相关观测概率，以识别代表输入语音向量的未知语音输入的单词序列。在语音处理器中，提供了在识别过程中采用观察概率的模型。通过共享输入层和隐藏层的单位以及将它们连接到多层感知器中的权重，将上下文相关网络的数量减少到单个网络，同时为每个相关上下文提供一个输出层。每个输出层作为独立网络被训练在其所代表的相应上下文的具体示例上。可以在上下文无关关联权重与训练正常收敛到的与上下文相关的权重之间的中间权重集合上优化训练。

2. 发明申请

US20120022873A1 Speech Recognition Language Models 审中-公开
标题翻译：语音识别语言模型
公开(公告)号：US20120022873A1
公开(公告)日：2012-01-26
申请号：US13249180
申请日：2011-09-29
申请人： Brandon M. Ballinger , Johan Schalkwyk , Michael H. Cohen , Cyril Georges Luc Allauzen
发明人： Brandon M. Ballinger , Johan Schalkwyk , Michael H. Cohen , Cyril Georges Luc Allauzen
IPC分类号： G10L21/00
CPC分类号： G06F3/167 , G06F3/04886 , G06F17/277 , G06F17/289 , G10L15/005 , G10L15/18 , G10L15/183 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/265 , G10L15/30 , G10L2015/223 , G10L2015/228
摘要： Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.
摘要翻译：描述了用于形成语音识别语言模型的方法，计算机程序产品和系统。通过识别确定与使用一个或多个搜索引擎的查询相关的网站来确定多个查询 - 网站关系。通过连接常见查询和连接公共网站，在查询 - 网站关系中识别群集。基于在包括网站的集群中的查询分析中的至少一个或者分析包括网站的集群中的网页的网页内容中的至少一个，为特定网站创建语音识别语言模型。

3. 发明授权

US06859776B1 Method and apparatus for optimizing a spoken dialog between a person and a machine 失效
标题翻译：用于优化人与机器之间的口语对话的方法和装置
公开(公告)号：US06859776B1
公开(公告)日：2005-02-22
申请号：US09412173
申请日：1999-10-04
申请人： Michael H. Cohen , Tracy D. Wax , Michael A. Prince , Steven C. Ehrlich
发明人： Michael H. Cohen , Tracy D. Wax , Michael A. Prince , Steven C. Ehrlich
IPC分类号： G06F3/16 , H04M3/493 , H04M7/12 , H04M11/00 , G10L21/00
CPC分类号： H04M3/4938 , H04M3/493 , H04M3/4931 , H04M7/12 , H04M2201/40 , H04M2201/60
摘要： A network comprises a number of speech-enabled sites maintaining a number of voice pages. A central server on the network executes a voice browser which provides users with access to the sites using voice-activated hyperlinks. The server also maintains and brokers information associated with the users based on spoken dialogs between the users and the sites. In response to a user accessing a given ASR site, information about that user is provided by the server for use by that ASR site. The information is used by the ASR site to optimize a spoken dialog between the user and the ASR site by reducing the amount of information the user is required to provide during the dialog. Information about the user can thereby be shared between separate speech enabled sites, in a manner which is transparent to the user, in order to expedite the user's interaction with those sites.
摘要翻译：网络包括维持多个语音页面的许多支持语音的站点。网络上的中央服务器执行语音浏览器，该用户使用语音激活的超链接为用户提供访问站点的功能。服务器还根据用户和站点之间的语音对话来维护和经纪人与用户相关的信息。响应于访问给定ASR站点的用户，由该服务器提供关于该用户的信息以供该ASR站点使用。该信息由ASR站点通过减少用户在对话期间提供的信息量来优化用户和ASR站点之间的语音对话。因此，可以以对用户透明的方式，在单独的支持语音的站点之间共享关于用户的信息，以便加速用户与这些站点的交互。

4. 发明授权

US09171454B2 Magic wand 有权
标题翻译：魔法棒
公开(公告)号：US09171454B2
公开(公告)日：2015-10-27
申请号：US11939739
申请日：2007-11-14
申请人： Andrew David Wilson , James E. Allard , Michael H. Cohen , Steven Drucker , Yu-Ting Kuo
发明人： Andrew David Wilson , James E. Allard , Michael H. Cohen , Steven Drucker , Yu-Ting Kuo
IPC分类号： G05B11/01 , G08C17/00
CPC分类号： G08C17/00 , G08C2201/30 , G08C2201/32 , G08C2201/51
摘要： The claimed subject matter relates to an architecture that can facilitate rich interaction with and/or management of environmental components included in an environment. The architecture can exist in whole or in part in a housing that can resemble a wand or similar object. The architecture can utilize one or more sensor from a collection of sensors to determine an orientation or gesture in connection with the wand, and can further issue an instruction to update a state of an environmental component based upon the orientation. In addition, the architecture can include an advisor component to provide contextual and/or comprehensive guidance in an intuitive manner.
摘要翻译：所要求保护的主题涉及可以促进与包括在环境中的环境组件的丰富交互和/或管理的架构。该结构可以全部或部分地存在于类似于魔杖或类似物体的壳体中。该结构可以利用传感器集合中的一个或多个传感器来确定与杖相关联的取向或手势，并且可以进一步发出基于取向来更新环境成分的状态的指令。此外，架构可以包括顾问组件，以直观的方式提供上下文和/或全面的指导。

5. 发明授权

US09026511B1 Call connection via document browsing 有权
标题翻译：通过文档浏览呼叫连接
公开(公告)号：US09026511B1
公开(公告)日：2015-05-05
申请号：US11169151
申请日：2005-06-29
申请人： Michael H. Cohen , Maryam Kamvar , Shumeet Baluja
发明人： Michael H. Cohen , Maryam Kamvar , Shumeet Baluja
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30864 , G06F17/30896
摘要： A system receives an indication of a document selected from a corpus of documents and determines a telephone number associated with the selected document. The system facilitates a voice call to the telephone number.
摘要翻译：系统接收从文档语料库中选择的文档的指示，并确定与所选择的文档相关联的电话号码。系统便于对电话号码的语音呼叫。

6. 发明授权

US08751217B2 Multi-modal input on an electronic device 有权
标题翻译：电子设备上的多模态输入
公开(公告)号：US08751217B2
公开(公告)日：2014-06-10
申请号：US13249172
申请日：2011-09-29
申请人： Brandon M. Ballinger , Johan Schalkwyk , Michael H. Cohen , William J. Byrne , Gudmundur Hafsteinsson , Michael J. LeBeau
发明人： Brandon M. Ballinger , Johan Schalkwyk , Michael H. Cohen , William J. Byrne , Gudmundur Hafsteinsson , Michael J. LeBeau
IPC分类号： G06F17/20
CPC分类号： G06F3/167 , G06F3/04886 , G06F17/277 , G06F17/289 , G10L15/005 , G10L15/18 , G10L15/183 , G10L15/197 , G10L15/22 , G10L15/26 , G10L15/265 , G10L15/30 , G10L2015/223 , G10L2015/228
摘要： A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.
摘要翻译：计算机实现的输入法编辑器处理包括从用户接收具有写入和口头输入能力的独立于应用的输入法编辑器的请求，识别用户即将向不依赖于应用的输入法编辑器提供口头输入，并接收来自用户的口头输入。口头输入对应于应用程序的输入，并转换为表示口头输入的文本。该文本作为输入提供给应用程序。

7. 发明授权

US07756708B2 Automatic language model update 有权
标题翻译：自动语言模型更新
公开(公告)号：US07756708B2
公开(公告)日：2010-07-13
申请号：US11396770
申请日：2006-04-03
申请人： Michael H. Cohen , Shumeet Baluja , Pedro J. Moreno
发明人： Michael H. Cohen , Shumeet Baluja , Pedro J. Moreno
IPC分类号： G10L15/06 , G10L15/08 , G10L15/00 , G06F17/30
CPC分类号： G10L15/065 , G10L15/06 , G10L15/063 , G10L15/187 , G10L15/26 , G10L2015/0635
摘要： A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.
摘要翻译：一种用于产生语音识别模型的方法，包括：访问基准语音识别模型，从搜索查询获得与最近的语言使用相关的信息，以及修改语音识别模型，以基于该信息修改声音发生的一部分的概率。声音的一部分可能包含一个字。另外，一种用于生成语音识别模型的方法包括：从搜索引擎从远程设备接收基本上表示音频记录的至少一部分的音频记录和抄本，将录音与音频记录同步，提取一个或从录音中提取更多的字母，并且从音频记录中提取一个或多个字母的相关联的发音，以及在发音词典中生成字典条目。

8. 发明申请

US20100121704A1 Activating Content Distribution 审中-公开
标题翻译：激活内容分发
公开(公告)号：US20100121704A1
公开(公告)日：2010-05-13
申请号：US12617266
申请日：2009-11-12
申请人： Vincent Vanhoucke , Michael H. Cohen , Manish G. Patel , Gudmundur Hafsteinsson
发明人： Vincent Vanhoucke , Michael H. Cohen , Manish G. Patel , Gudmundur Hafsteinsson
IPC分类号： G06Q30/00
CPC分类号： G06Q30/0246 , G06Q30/02 , G06Q30/0255 , G06Q30/0261
摘要： A computer-implemented method for advertisement distribution includes receiving, in a computer system, an input from an advertiser that has previously registered an advertisement for on-demand activation. The input is generated based on the advertiser having an immediate availability and directs the computer system to initiate the on-demand activation substantially in real time with receiving the input. The method includes determining, using the computer system, a geographic location of the advertiser that corresponds to the immediate availability. The method includes defining, using the computer system, a target group to which the advertisement is to be presented, the target group identified based on at least the geographic location and the immediate availability. The method includes initiating the on-demand activation using the computer system, for receipt of the advertisement by at least part of the target group, the on-demand activation initiated substantially in real time with receiving the input.
摘要翻译：用于广告分发的计算机实现的方法包括在计算机系统中接收来自已经注册了用于按需激活的广告的广告商的输入。输入是基于具有即时可用性的广告商生成的，并且指导计算机系统基本实时地接收输入来启动按需激活。该方法包括使用计算机系统来确定与立即可用性相对应的广告商的地理位置。该方法包括使用计算机系统定义要向其呈现广告的目标组，至少基于地理位置和即时可用性来识别目标组。该方法包括使用计算机系统启动按需激活，用于由目标组的至少一部分接收广告，基本实时地接收输入的按需激活。

9. 发明授权

US07627638B1 Verbal labels for electronic messages 有权
标题翻译：用于电子信息的口头标签
公开(公告)号：US07627638B1
公开(公告)日：2009-12-01
申请号：US11019431
申请日：2004-12-20
申请人： Michael H. Cohen
发明人： Michael H. Cohen
IPC分类号： G06F15/16
CPC分类号： G06Q10/107 , H04L51/34
摘要： Verbal labels for electronic messages, as well as systems and methods for making and using such labels, are disclosed. A verbal label is a label containing audio data (such as a digital audio file of a user's voice and/or a speaker template thereof) that is associated with one or more electronic messages. Verbal labels permit a user to more efficiently manipulate e-mail and other electronic messages by voice. For example, a user can add such labels verbally to an e-mail or to a group of e-mails, thereby permitting these messages to be sorted and retrieved more easily.
摘要翻译：公开了用于电子消息的口头标签，以及用于制作和使用这些标签的系统和方法。语言标签是包含与一个或多个电子消息相关联的音频数据（例如用户的语音的数字音频文件和/或其扬声器模板）的标签。语言标签允许用户通过语音更有效地操纵电子邮件和其他电子消息。例如，用户可以将这些标签口头地添加到电子邮件或一组电子邮件，从而允许更容易地对这些消息进行排序和检索。

10. 发明授权

US5581655A Method for recognizing speech using linguistically-motivated hidden Markov models 失效
标题翻译：使用语言学动机的隐马尔可夫模型识别语音的方法
公开(公告)号：US5581655A
公开(公告)日：1996-12-03
申请号：US589432
申请日：1996-01-22
申请人： Michael H. Cohen , Mitchel Weintraub , Patti J. Price , Hy Murveit , Jared C. Bernstein
发明人： Michael H. Cohen , Mitchel Weintraub , Patti J. Price , Hy Murveit , Jared C. Bernstein
IPC分类号： G10L15/06 , G09B19/04 , G10L15/14 , G10L15/18 , G10L5/06
CPC分类号： G10L15/187 , G09B19/04 , G10L15/142
摘要： An automatic speech recognition methodology, wherein words are modeled as probabilistic networks of allophones, collects nodes in the probabilistic network into equivalence classes when those nodes have the same allophonic choices governed by the same phonological rules. The allophonic choices allow for representation of dialectic pronunciation variations between different speakers. Training data is shared among nodes in an equivalence class so that accurate pronunciation probabilities may be determined even for words for which there is only a limited amount of training data. A method is used to determine probabilities for each of a multitude of pronunciation models for each word in the vocabulary, based on automatic extraction of linguistic knowledge from sets of phonological rules, in order to robustly and accurately model dialectal variation.
摘要翻译：一种自动语音识别方法，其中单词被建模为异常的概率网络，当这些节点具有由相同语音规则控制的相同的等式选择时，将概率网络中的节点收集成等价类。不平衡选择允许在不同说话者之间表达辩证的发音变化。培训数据在等价类中的节点之间共享，使得甚至对于只有有限量的训练数据的单词也可以确定准确的发音概率。基于语言规则集合中语言知识的自动提取，为了强化和准确地模拟方言的变化，使用一种方法来确定词汇表中每个单词的多个发音模型的概率。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式