专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07251603B2 Audio-only backoff in audio-visual speech recognition system 有权
标题翻译：音视频语音识别系统中的音频回退
公开(公告)号：US07251603B2
公开(公告)日：2007-07-31
申请号：US10601350
申请日：2003-06-23
申请人： Jonathan H. Connell , Norman Haas , Etienne Marcheret , Chalapathy Venkata Neti , Gerasimos Potamianos
发明人： Jonathan H. Connell , Norman Haas , Etienne Marcheret , Chalapathy Venkata Neti , Gerasimos Potamianos
IPC分类号： G10L21/00
CPC分类号： G10L15/25
摘要： Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.
摘要翻译：在劣化的视觉环境中执行视听语音识别技术，具有改进的识别性能。例如，在本发明的一个方面，根据用于改善其识别性能的视听语音识别系统使用的技术包括以下步骤/操作：（i）在仅声学数据模型和基于与视觉环境相关的条件的声学可视数据模型; 以及（ii）使用所选择的数据模型解码输入口头发音的至少一部分。有利的是，在恶化的视觉条件期间，视听语音识别系统能够使用仅音频数据解码（识别）输入语音数据，从而避免了基于声学可视数据模型执行语音识别可能导致的识别不准确并降低视觉数据。

2. 发明授权

US06594629B1 Methods and apparatus for audio-visual speech detection and recognition 有权
标题翻译：视听语音检测和识别的方法和装置
公开(公告)号：US06594629B1
公开(公告)日：2003-07-15
申请号：US09369707
申请日：1999-08-06
申请人： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior
发明人： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior
IPC分类号： G10L1500
CPC分类号： G06K9/00228 , G06K9/00335 , G10L15/25 , G10L25/78
摘要： In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech. Speech detection may also be performed using information from both the video path and the audio path simultaneously.
摘要翻译：在本发明的第一方面，用于提供语音识别的方法和装置包括以下步骤：处理与任意内容视频源相关联的视频信号，处理与视频信号相关联的音频信号，以及结合处理的音频信号处理的视频信号以产生表示音频信号的解码输出信号。在本发明的第二方面6f中，根据语音识别系统提供语音检测的方法和装置包括以下步骤：处理与视频源相关联的视频信号，以检测与视频信号相关联的一个或多个特征是否代表并且当与视频信号相关联的一个或多个特征代表语音时，根据语音识别系统处理与视频信号相关联的音频信号，以产生表示音频信号的解码输出信号。也可以使用来自视频路径和音频路径的信息同时执行语音检测。

3. 发明授权

US5953701A Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence 失效
标题翻译：语音识别模型结合了性别依赖和与性别无关的手机状态，并使用语音背景相关性
公开(公告)号：US5953701A
公开(公告)日：1999-09-14
申请号：US10466
申请日：1998-01-22
申请人： Chalapathy Venkata Neti , Salim Estephan Roukos
发明人： Chalapathy Venkata Neti , Salim Estephan Roukos
IPC分类号： G10L5/06
CPC分类号： G10L15/07 , G10L15/142
摘要： A method of gender dependent speech recognition includes the steps of identifying phone state models common to both genders, identifying gender specific phone state models, identifying a gender of a speaker and recognizing acoustic data from the speaker. A method of constructing a gender-dependent speech recognition model includes the steps of providing training data of a known gender, aligning the training data, tagging the training data with a gender to create gender-tagged data, determining a gender question at a node to determine gender dependence of the gender-tagged data, determining a phonetic context question at the node to determine phonetic context dependence of the gender-tagged data, determining a highest value of an evaluation function between the gender dependence and the phonetic context dependence to determine which dependence is a dominant dependence, splitting the data of the dominant dependence into child nodes according to likelihood criteria, comparing the highest value with a threshold value to determine if additional splitting is necessary, repeating theses steps for each child node until the highest value is below the threshold value and counting the nodes having gender dependence to determine an overall gender dependence level. A gender-dependent speech recognition system includes an input device for inputting speech to a preprocessor. The preprocessor converts the speech into acoustic data, and a processor for identifies gender-dependent phone state models and phone state modes common to both genders. The phone state models are stored in a memory device wherein the processor recognizes the speech in accordance with the phone state models.
摘要翻译：一种性别依赖性语音识别的方法包括识别两性的共同的电话状态模型，识别性别特定的电话状态模型，识别说话人的性别以及从说话者识别声学数据的步骤。一种构建性别相关语音识别模型的方法包括以下步骤：提供已知性别的训练数据，对准训练数据，将训练数据与性别标记以产生性别标记的数据，在节点处确定性别问题确定性别标签数据的性别依赖性，确定节点处的语音上下文问题以确定性别标记数据的语音上下文依赖性，确定性别依赖性和语音上下文依赖性之间的评估函数的最高值，以确定哪个依赖性是主要依赖，根据似然准则将主要依赖的数据分解为子节点，将最高值与阈值进行比较，以确定是否需要额外的分割，重复每个子节点的这些步骤，直到最高值低于阈值并计算具有性别依赖性的节点以确定整体性别依赖度。性别依赖语音识别系统包括用于向预处理器输入语音的输入装置。预处理器将语音转换为声学数据，以及用于识别性别相关电话状态模型和两种性别共同的电话状态模式的处理器。电话状态模型存储在存储设备中，其中处理器根据电话状态模型识别语音。

4. 发明授权

US07295979B2 Language context dependent data labeling 有权
标题翻译：语言上下文相关数据标签
公开(公告)号：US07295979B2
公开(公告)日：2007-11-13
申请号：US09790296
申请日：2001-02-22
申请人： Chalapathy Venkata Neti , Nitendra Rajput , L. Venkata Subramaniam , Ashish Verma
发明人： Chalapathy Venkata Neti , Nitendra Rajput , L. Venkata Subramaniam , Ashish Verma
IPC分类号： G10L15/06 , G10L15/00
CPC分类号： G10L15/06 , G10L15/187
摘要： Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.
摘要翻译：当两种语言共享相似的声学空间时，将系统从一种语言引导到另一种语言通常会很好。然而，当新语言的语音不会出现在引导引导的语言中时，引导不会产生良好的初始模型，并且新的语言数据未正确对齐这些模型。本发明提供了使用另一种语言的识别系统来生成新语言数据的上下文相关标签的技术。然后，这个标记的数据用于生成新语言手机的模型。

5. 发明授权

US06964023B2 System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input 有权
标题翻译：多模式焦点检测的系统和方法，使用多模态输入的参考模糊度分解和心情分类
公开(公告)号：US06964023B2
公开(公告)日：2005-11-08
申请号：US09776654
申请日：2001-02-05
申请人： Stephane Herman Maes , Chalapathy Venkata Neti
发明人： Stephane Herman Maes , Chalapathy Venkata Neti
IPC分类号： G06T1/00 , G06F3/00 , G06F3/01 , G06F3/033 , G06F3/048 , G06F9/45 , G06F17/00 , G06K9/00 , G06T7/00 , G06T7/20 , G10L15/22 , G10L15/24
CPC分类号： G06K9/00248 , G06F3/0481 , G10L15/24 , G10L2015/227 , Y10S715/966
摘要： Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.
摘要翻译：提供了系统和方法，用于在变化的操作条件下，根据多模式输入数据执行焦点检测，参考模糊度分解和情绪分类，以便为一个或多个用户提供有效的对话计算环境。

6. 发明授权

US06816836B2 Method and apparatus for audio-visual speech detection and recognition 有权
标题翻译：用于视听语音检测和识别的方法和装置
公开(公告)号：US06816836B2
公开(公告)日：2004-11-09
申请号：US10231676
申请日：2002-08-30
申请人： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior
发明人： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior
IPC分类号： G10L1500
CPC分类号： G06K9/00228 , G06K9/00335 , G10L15/25 , G10L25/78
摘要： Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

7. 发明授权

US06219640B1 Methods and apparatus for audio-visual speaker recognition and utterance verification 有权
标题翻译：视听说话者识别和话语验证的方法和装置
公开(公告)号：US06219640B1
公开(公告)日：2001-04-17
申请号：US09369706
申请日：1999-08-06
申请人： Sankar Basu , Homayoon S. M. Beigi , Stephane Herman Maes , Benoit Emmanuel Ghislain Maison , Chalapathy Venkata Neti , Andrew William Senior
发明人： Sankar Basu , Homayoon S. M. Beigi , Stephane Herman Maes , Benoit Emmanuel Ghislain Maison , Chalapathy Venkata Neti , Andrew William Senior
IPC分类号： G10L1500
CPC分类号： G06K9/6293 , G06K9/00221 , G06K9/00885 , G07C9/00158 , G10L2015/226
摘要： Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.
摘要翻译：用于执行说话者识别的方法和装置包括处理与任意内容视频源相关联的视频信号并处理与视频信号相关联的音频信号。然后，基于经处理的音频信号和处理的视频信号进行识别和/或验证决定。可以采用各种决策实施例，包括但不限于分数组合方法，特征组合方法和重新评分方法。在本发明的另一方面，验证语音发声的方法包括处理与视频源相关联的视频信号并处理与视频信号相关联的音频信号。然后，将经处理的音频信号与经处理的视频信号进行比较，以确定信号之间的相关性水平。这被称为无监督话语验证。在受监督的话语验证实施例中，将处理的视频信号与表示与视频信号相关联的音频信号的脚本进行比较，以确定信号之间的相关性水平。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式