专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06317710B1 Multimedia search apparatus and method for searching multimedia content using speaker detection by audio data 有权
标题翻译：多媒体搜索装置及使用音频数据的扬声器检测来搜索多媒体内容的方法
公开(公告)号：US06317710B1
公开(公告)日：2001-11-13
申请号：US09353192
申请日：1999-07-14
申请人： Qian Huang , Ivan Magrin-Chagnolleau , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
发明人： Qian Huang , Ivan Magrin-Chagnolleau , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
IPC分类号： G01L1700
CPC分类号： G10L17/00
摘要： A multimedia search apparatus and method for searching multimedia content using speaker detection to segment the multimedia content. The multimedia search apparatus receives a search request from a user device. The search request identifies the target speaker for which the search is to be conducted. Based on the search request, the multimedia search apparatus retrieves multimedia content from a multimedia database. The multimedia search apparatus retrieves models, such as Gaussian Mixture Models (GMMs), from a model storage device, corresponding to the target speaker and background data. Based on the retrieved models, the multimedia search device searches the audio data of the multimedia content and segments the audio data. The segments are identified by calculating an average normalized score for a block of frames of the audio data and determining if the average normalized score for the block of frames exceeds one or more predetermined thresholds.
摘要翻译：一种多媒体搜索装置和方法，用于使用说话者检测来搜索多媒体内容来分割多媒体内容。多媒体搜索装置从用户装置接收搜索请求。搜索请求标识要进行搜索的目标扬声器。基于搜索请求，多媒体搜索装置从多媒体数据库检索多媒体内容。多媒体搜索装置从对应于目标说话者和背景数据的模型存储装置中检索诸如高斯混合模型（GMM）的模型。基于所检索的模型，多媒体搜索装置搜索多媒体内容的音频数据并对音频数据进行分段。通过计算音频数据的帧块的平均归一化分数并确定帧块的平均归一化分数是否超过一个或多个预定阈值来识别段。

2. 发明授权

US06405166B1 Multimedia search apparatus and method for searching multimedia content using speaker detection by audio data 有权
标题翻译：多媒体搜索装置及使用音频数据的扬声器检测来搜索多媒体内容的方法
公开(公告)号：US06405166B1
公开(公告)日：2002-06-11
申请号：US09976023
申请日：2001-10-15
申请人： Qian Huang , Ivan Magrin-Chagnolleau , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
发明人： Qian Huang , Ivan Magrin-Chagnolleau , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
IPC分类号： G10L1700
CPC分类号： G10L17/00
摘要： A multimedia search apparatus and method for searching multimedia content using speaker detection to segment the multimedia content. The multimedia search apparatus receives a search request from a user device. The search request identifies the target speaker for which the search is to be conducted. Based on the search request, the multimedia search apparatus retrieves multimedia content from a multimedia database. The multimedia search apparatus retrieves models, such as Gaussian Mixture Models (GMMs), from a model storage device, corresponding to the target speaker and background data. Based on the retrieved models, the multimedia search device searches the multimedia data of the multimedia content and segments the multimedia data. The segments are identified by calculating an average normalized score for a block of frames of the multimedia data and determining if the average normalized score for the block of frames exceeds one or more predetermined thresholds.
摘要翻译：一种多媒体搜索装置和方法，用于使用说话者检测来搜索多媒体内容来分割多媒体内容。多媒体搜索装置从用户装置接收搜索请求。搜索请求标识要进行搜索的目标扬声器。基于搜索请求，多媒体搜索装置从多媒体数据库检索多媒体内容。多媒体搜索装置从对应于目标说话者和背景数据的模型存储装置中检索诸如高斯混合模型（GMM）的模型。基于所检索的模型，多媒体搜索装置搜索多媒体内容的多媒体数据并分割多媒体数据。通过计算多媒体数据的帧块的平均归一化分数并确定帧块的平均归一化分数是否超过一个或多个预定阈值来标识段。

3. 发明申请

US20120185237A1 SYSTEM AND METHOD OF PERFORMING USER-SPECIFIC AUTOMATIC SPEECH RECOGNITION 有权
标题翻译：执行用户特定自动语音识别的系统和方法
公开(公告)号：US20120185237A1
公开(公告)日：2012-07-19
申请号：US13429946
申请日：2012-03-26
申请人： Bojana GAJIC , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
发明人： Bojana GAJIC , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
IPC分类号： G06F17/20 , G10L17/00
CPC分类号： G10L15/07 , G10L15/20
摘要： Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
摘要翻译：语音识别模型基于用户信息，应用信息，背景噪声等背景信息和传感器响应特性等传感器信息进行动态重新配置，为用户提供键盘文本输入的备用输入模式。为应用程序的每个数据字段生成字识别网格，并动态连接到单个字识别网格中。将语言模型应用于级联的单词识别格以确定单词识别格子之间的关系并重复，直到生成的单词识别格子可接受或仅与预定值不同，仅通过阈值量。这些动态可重配置语音识别技术提供了诸如移动电话和个人数字助理以及诸如办公室，家庭或车辆等环境的语音识别部署，同时保持语音识别的准确性。

4. 发明授权

US07664636B1 System and method for indexing voice mail messages by speaker 有权
标题翻译：通过扬声器索引语音邮件的系统和方法
公开(公告)号：US07664636B1
公开(公告)日：2010-02-16
申请号：US09550686
申请日：2000-04-17
申请人： Julia Hirschberg , Sarangarajan Parthasarathy , Aaron Edward Rosenberg , Stephen Whittaker
发明人： Julia Hirschberg , Sarangarajan Parthasarathy , Aaron Edward Rosenberg , Stephen Whittaker
IPC分类号： G10L15/00
CPC分类号： H04M3/533 , G10L17/00 , G10L17/04
摘要： The invention provides a system and method for indexing and organizing voice mail message by the speaker of the message. One or more speaker models are created from voice mail messages received. As additional messages are left, each of the new messages are compared with existing speaker models to determine the identity of the callers of each of the new messages. The voice mail messages are organized within a user's mailbox by caller. Unknown callers may be identified and tagged by the user and then used to create new speaker models and/or update existing speaker models.
摘要翻译：本发明提供了一种用于由消息的说话者索引和组织语音邮件消息的系统和方法。从接收到的语音邮件消息创建一个或多个扬声器模型。随着附加的消息被留下，每个新消息与现有的说话者模型进行比较，以确定每个新消息的呼叫者的身份。语音邮件消息由呼叫者组织在用户的邮箱内。未知的呼叫者可能被用户识别和标记，然后用于创建新的扬声器模型和/或更新现有的扬声器模型。

5. 发明授权

US07930179B1 Unsupervised speaker segmentation of multi-speaker speech data 有权
标题翻译：多扬声器语音数据的无监督扬声器分割
公开(公告)号：US07930179B1
公开(公告)日：2011-04-19
申请号：US11866125
申请日：2007-10-02
申请人： Allen Louis Gorin , Zhu Liu , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
发明人： Allen Louis Gorin , Zhu Liu , Sarangarajan Parthasarathy , Aaron Edward Rosenberg
IPC分类号： G10L17/00
CPC分类号： G10L17/12
摘要： Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.
摘要翻译：用于扬声器的多扬声器语音或音频数据的无监督分割的系统和方法。应用前端分析来输入语音数据以获得特征向量。语音数据最初被分段，然后被聚集成对应于不同说话者的段的组。这些簇被迭代地建模和重新分段以获得稳定的扬声器分割。检查分割集之间的重叠以确保成功的说话者分割。重叠片段被组合并重新构建并重新分段。可选地，语音数据被处理以产生分割格子以最大化整体分割似然。

6. 发明授权

US07451081B1 System and method of performing speech recognition based on a user identifier 有权
标题翻译：基于用户标识符执行语音识别的系统和方法
公开(公告)号：US07451081B1
公开(公告)日：2008-11-11
申请号：US11685456
申请日：2007-03-13
申请人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
发明人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
IPC分类号： G10L15/00
CPC分类号： G10L15/07 , G10L15/20
摘要： Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
摘要翻译：语音识别模型基于用户信息，应用信息，背景噪声等背景信息和传感器响应特性等传感器信息进行动态重新配置，为用户提供键盘文本输入的备用输入模式。为应用程序的每个数据字段生成字识别网格，并动态连接到单个字识别网格中。将语言模型应用于级联的单词识别格以确定单词识别格子之间的关系并重复，直到生成的单词识别格子可接受或仅与预定值不同，仅通过阈值量。这些动态可重配置语音识别技术提供了诸如移动电话和个人数字助理以及诸如办公室，家庭或车辆等环境的语音识别部署，同时保持语音识别的准确性。

7. 发明申请

US20100166157A1 System and Method for Indexing Voice Mail Messages By Speaker 有权
标题翻译：通过扬声器索引语音邮件消息的系统和方法
公开(公告)号：US20100166157A1
公开(公告)日：2010-07-01
申请号：US12648909
申请日：2009-12-29
申请人： Julia Hirschberg , Sarangarajan Parthasarathy , Aaron Edward Rosenberg , Stephen Whittaker
发明人： Julia Hirschberg , Sarangarajan Parthasarathy , Aaron Edward Rosenberg , Stephen Whittaker
IPC分类号： H04M1/64
CPC分类号： H04M3/533 , G10L17/00 , G10L17/04
摘要： The invention provides a system and method for indexing and organizing voice mail message by the speaker of the message. One or more speaker models are created from voice mail messages received. As additional messages are left, each of the new messages are compared with existing speaker models to determine the identity of the callers of each of the new messages. The voice mail messages are organized within a user's mailbox by caller. Unknown callers may be identified and tagged by the user and then used to create new speaker models and/or update existing speaker models.
摘要翻译：本发明提供了一种用于由消息的说话者索引和组织语音邮件消息的系统和方法。从接收到的语音邮件消息创建一个或多个扬声器模型。随着附加的消息被留下，每个新消息与现有的说话者模型进行比较，以确定每个新消息的呼叫者的身份。语音邮件消息由呼叫者组织在用户的邮箱内。未知的呼叫者可能被用户识别和标记，然后用于创建新的扬声器模型和/或更新现有的扬声器模型。

8. 发明申请

US20090006088A1 SYSTEM AND METHOD OF PERFORMING SPEECH RECOGNITION BASED ON A USER IDENTIFIER 有权
标题翻译：基于用户识别器进行语音识别的系统和方法
公开(公告)号：US20090006088A1
公开(公告)日：2009-01-01
申请号：US12207175
申请日：2008-09-09
申请人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
发明人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
IPC分类号： G10L15/20
CPC分类号： G10L15/07 , G10L15/20
摘要： Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
摘要翻译：语音识别模型基于用户信息，应用信息，背景噪声等背景信息和传感器响应特性等传感器信息进行动态重新配置，为用户提供键盘文本输入的备用输入模式。为应用程序的每个数据字段生成字识别网格，并动态连接到单个字识别网格中。将语言模型应用于级联的单词识别格以确定单词识别格子之间的关系并重复，直到生成的单词识别格子可接受或仅与预定值不同，仅通过阈值量。这些动态可重配置语音识别技术提供了诸如移动电话和个人数字助理以及诸如办公室，家庭或车辆等环境的语音识别部署，同时保持语音识别的准确性。

9. 发明授权

US07209880B1 Systems and methods for dynamic re-configurable speech recognition 有权
标题翻译：用于动态可重配置语音识别的系统和方法
公开(公告)号：US07209880B1
公开(公告)日：2007-04-24
申请号：US10091689
申请日：2002-03-06
申请人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
发明人： Bojana Gajic , Shrikanth Sambasivan Narayanan , Sarangarajan Parthasarathy , Richard Cameron Rose , Aaron Edward Rosenberg
IPC分类号： G10L15/00 , G10L11/00 , G06F15/00
CPC分类号： G10L15/07 , G10L15/20
摘要： Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
摘要翻译：语音识别模型基于用户信息，应用信息，背景噪声等背景信息和传感器响应特性等传感器信息进行动态重新配置，为用户提供键盘文本输入的备用输入模式。为应用程序的每个数据字段生成字识别网格，并动态连接到单个字识别网格中。将语言模型应用于级联的单词识别格以确定单词识别格子之间的关系并重复，直到生成的单词识别格子可接受或仅与预定值不同，仅通过阈值量。这些动态可重配置语音识别技术提供了诸如移动电话和个人数字助理以及诸如办公室，家庭或车辆等环境的语音识别部署，同时保持语音识别的准确性。

10. 发明授权

US5913192A Speaker identification with user-selected password phrases 失效
标题翻译：用户选择的密码短语的扬声器识别
公开(公告)号：US5913192A
公开(公告)日：1999-06-15
申请号：US916662
申请日：1997-08-22
申请人： Sarangarajan Parthasarathy , Aaron Edward Rosenberg
发明人： Sarangarajan Parthasarathy , Aaron Edward Rosenberg
IPC分类号： G10L15/00 , G10L17/00 , G10L5/06 , G10L9/00
CPC分类号： G10L17/24 , G10L15/1815 , G10L2015/085
摘要： A speaker identification system includes a speaker-independent phrase recognizer. The speaker-independent phrase recognizer scores a password utterance against all the sets of phonetic transcriptions in a lexicon database to determine the N best speaker-independent scores, determines the N best sets of phonetic transcriptions based on the N best speaker-independent scores, and determines the N best possible identities. A speaker-dependent phrase recognizer retrieves the hidden Markov model corresponding to each of the N best possible identities, and scores the password utterance against each of the N hidden Markov models to generate a speaker-dependent score for each of the N best possible identities. A score processor coupled to the outputs of the speaker-independent phrase recognizer and the speaker-dependent phrase recognizer determines a putative identity. A verifier coupled to the score processor authenticates the determined putative identity.
摘要翻译：扬声器识别系统包括与扬声器无关的短语识别器。与扬声器无关的短语识别器对词典数据库中的所有语音转录集进行口令发音评分，以确定N个最佳的独立于演讲者的得分，基于N个最佳的独立于演讲者的得分确定N个最佳语音转录集，以及确定N最好的身份。与扬声器相关的短语识别器检索与N个最佳可能身份中的每一个相对应的隐马尔可夫模型，并且对每个N个隐马尔可夫模型对密码发音进行评分，以产生针对N个最佳可能身份中的每一个的说话者相关得分。耦合到与扬声器无关的短语识别器和与扬声器相关的短语识别器的输出的分数处理器确定推定的身份。耦合到评分处理器的验证器对所确定的推定身份进行认证。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式