专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07454342B2 Coupled hidden Markov model (CHMM) for continuous audiovisual speech recognition 有权
标题翻译：耦合隐马尔可夫模型（CHMM）用于连续视听语音识别
公开(公告)号：US07454342B2
公开(公告)日：2008-11-18
申请号：US10392709
申请日：2003-03-19
申请人： Ara Victor Nefian , Xiaoxing Liu , Xiaobo Pi , Luhong Liang , Yibao Zhao
发明人： Ara Victor Nefian , Xiaoxing Liu , Xiaobo Pi , Luhong Liang , Yibao Zhao
IPC分类号： G10L15/14
CPC分类号： G06K9/6293 , G06K9/6297 , G10L15/142 , G10L15/25
摘要： Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.
摘要翻译：本文描述了使用耦合隐马尔可夫模型（CHMM）的视听连续语音识别（AVCSR）系统的方法和装置。在一个方面，示例性过程包括接收音频数据流和视频数据流，以及使用多个隐马尔可夫模型（HMM），基于音频和视频数据流执行连续语音识别，每个HMM的节点在时隙处于前一时隙处的相关HMM的一个或多个节点。还描述了其它方法和装置。

2. 发明授权

US07472063B2 Audio-visual feature fusion and support vector machine useful for continuous speech recognition 有权
标题翻译：视听特征融合和支持向量机，可用于连续语音识别
公开(公告)号：US07472063B2
公开(公告)日：2008-12-30
申请号：US10326368
申请日：2002-12-19
申请人： Ara V. Nefian , Xiaobo Pi , Luhong Liang , Xiaoxing Liu , Yibao Zhao
发明人： Ara V. Nefian , Xiaobo Pi , Luhong Liang , Xiaoxing Liu , Yibao Zhao
IPC分类号： G10L15/14 , G06K9/70
CPC分类号： G06K9/6293 , G10L15/25
摘要： A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
摘要翻译：语音识别方法包括描述向口区域应用支持向量机分析的几个实施例。可以准确地确定唇部位置并将其与同步或异步音频数据结合使用，以增强语音识别概率。

3. 发明申请

US20050027530A1 Audio-visual speaker identification using coupled hidden markov models 审中-公开
标题翻译：使用耦合的隐马尔可夫模型的视听扬声器识别
公开(公告)号：US20050027530A1
公开(公告)日：2005-02-03
申请号：US10631424
申请日：2003-07-31
申请人： Tieyan Fu , Xiaoxing Liu , Luhong Liang , Xiaobo Pi , Ara Nefian
发明人： Tieyan Fu , Xiaoxing Liu , Luhong Liang , Xiaobo Pi , Ara Nefian
IPC分类号： G06K9/62 , G06K9/68 , G10L15/24 , G10L17/00 , G10L15/14
CPC分类号： G06K9/6293 , G06K9/6297 , G10L15/24 , G10L17/10 , G10L17/16
摘要： A phoneme and a viseme of a person may be modeled using a coupled hidden Markov model. The coupled hidden Markov model and a second model may be compared to identify the person.
摘要翻译：可以使用耦合的隐马尔可夫模型来模拟人的音素和视力。可以将耦合的隐马尔科夫模型和第二模型进行比较以识别人。

4. 发明授权

US07346497B2 High-order entropy error functions for neural classifiers 失效
标题翻译：神经分类器的高阶熵误差函数
公开(公告)号：US07346497B2
公开(公告)日：2008-03-18
申请号：US10332651
申请日：2001-05-08
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： G10L11/00
CPC分类号： G10L15/16 , G10L15/063 , G10L15/10
摘要： An automatic speech recognition system comprising a speech decoder to resolve phone and word level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.
摘要翻译：一种自动语音识别系统，包括用于解析电话和字级信息的语音解码器，矢量生成器，用于生成由神经网络分类器（ANN）基于置信度量度的信息矢量。设计出不会产生假饱和或过度专业化的误差信号。误差信号被集成到通过ANN反向传播的误差函数中。

5. 发明申请

US20050015251A1 High-order entropy error functions for neural classifiers 失效
标题翻译：神经分类器的高阶熵误差函数
公开(公告)号：US20050015251A1
公开(公告)日：2005-01-20
申请号：US10332651
申请日：2001-05-08
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： G10L15/06 , G10L15/10 , G10L15/16 , G06E1/00 , G06E3/00 , G06F15/18 , G06G7/00 , G10L15/00
CPC分类号： G10L15/16 , G10L15/063 , G10L15/10
摘要： An automatic speech recognition system comprising a speech decoder to resolve phone and world level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.
摘要翻译：一种自动语音识别系统，包括用于解析电话和世界级信息的语音解码器，矢量生成器，用于生成由神经网络分类器（ANN）基于置信度量度的信息矢量。设计出不会产生假饱和或过度专业化的误差信号。误差信号被集成到通过ANN反向传播的误差函数中。

6. 发明授权

US07437286B2 Voice barge-in in telephony speech recognition 有权
标题翻译：语音插入电话语音识别
公开(公告)号：US07437286B2
公开(公告)日：2008-10-14
申请号：US10204034
申请日：2000-12-27
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： G10L11/02
CPC分类号： G01S13/66 , G01S13/726 , G01S13/9303 , G10L15/22 , G10L25/78 , G10L2021/02087 , G10L2025/783
摘要： An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.
摘要翻译：描述了一种交互式语音应答系统，其支持全双工数据传输，以便在系统从用户收听语音插入时，向电话系统的用户播放语音提示。该系统包括语音检测模块，其可以利用各种标准，例如帧能量幅度和持续时间阈值来检测语音。该系统还包括自动语音识别引擎。当自动语音识别引擎识别出语音段时，可以使用特征提取模块从系统记录的回波污浊语音频谱中减去对应于当前播放的语音提示的提示回波频谱。为了改进频谱减法，还可以执行回声污浊语音与提示回波之间的时间延迟的估计。

7. 发明授权

US08473290B2 Voice barge-in in telephony speech recognition 有权
公开(公告)号：US08473290B2
公开(公告)日：2013-06-25
申请号：US12197801
申请日：2008-08-25
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： G10L11/02
CPC分类号： G01S13/66 , G01S13/726 , G01S13/9303 , G10L15/22 , G10L25/78 , G10L2021/02087 , G10L2025/783
摘要： An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

8. 发明申请

US20080310601A1 VOICE BARGE-IN IN TELEPHONY SPEECH RECOGNITION 有权
标题翻译：电话语音识别中的语音
公开(公告)号：US20080310601A1
公开(公告)日：2008-12-18
申请号：US12197801
申请日：2008-08-25
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： H04M1/64
CPC分类号： G01S13/66 , G01S13/726 , G01S13/9303 , G10L15/22 , G10L25/78 , G10L2021/02087 , G10L2025/783
摘要： An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.
摘要翻译：描述了一种交互式语音应答系统，其支持全双工数据传输，以便在系统从用户收听语音插入时，向电话系统的用户播放语音提示。该系统包括语音检测模块，其可以利用各种标准，例如帧能量幅度和持续时间阈值来检测语音。该系统还包括自动语音识别引擎。当自动语音识别引擎识别出语音段时，可以使用特征提取模块从系统记录的回波污浊语音频谱中减去对应于当前播放的语音提示的提示回波频谱。为了改进频谱减法，还可以执行回声污浊语音与提示回波之间的时间延迟的估计。

9. 发明授权

US07072750B2 Method and apparatus for rejection of speech recognition results in accordance with confidence level 失效
标题翻译：根据置信水平排除语音识别结果的方法和装置
公开(公告)号：US07072750B2
公开(公告)日：2006-07-04
申请号：US10332650
申请日：2001-05-08
申请人： Xiaobo Pi , Ying Jia
发明人： Xiaobo Pi , Ying Jia
IPC分类号： G06F7/00
CPC分类号： G10L15/14
摘要： An automatic speech recognition system for continuous speech recognition of vocabulary words for an autoattendent system proving hand-free telephone calling and utilizing a vocabulary comprising numbers or names of people to be called using known techniques for automatic speech recognition models of word sequencing resulting in high confidence levels of recognition.
摘要翻译：一种自动语音识别系统，用于自动人事系统的词汇词的连续语音识别，证明免提电话呼叫并利用包括使用已知技术的人数或姓名的词汇，以使用字序列的自动语音识别模型导致高置信度认可水平

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式