专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20050075881A1 Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
标题翻译：语音标记，语音注释和可选后置处理的便携式设备的语音识别
公开(公告)号：US20050075881A1
公开(公告)日：2005-04-07
申请号：US10677174
申请日：2003-10-02
申请人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/26 , G10L21/00
CPC分类号： G06F17/30796 , G10L15/26
摘要： A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要翻译：媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上，并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。媒体标签器使用生成的语音识别文本来标记捕获的媒体，并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。在后期处理中，注释可以转换为标签，用于使用字母对声音规则和拼写单词输入来编辑词典，或直接与语音匹配以检索所捕获的媒体。

2. 发明授权

US06697778B1 Speaker verification and speaker identification based on a priori knowledge 有权
标题翻译：基于先验知识的扬声器验证和扬声器识别
公开(公告)号：US06697778B1
公开(公告)日：2004-02-24
申请号：US09610495
申请日：2000-07-05
申请人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L1506
CPC分类号： G10L17/02
摘要： Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
摘要翻译：扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器，或者由训练和客户端扬声器组合构成。还提供了基于客户端环境信息对扬声器空间的再估计，以提高客户端数据落入扬声器空间的可能性。在将客户登记到扬声器空间中，当满足预定条件时，可以获得额外的客户端语音。扬声器分配也可以在客户端注册步骤中使用。

3. 发明授权

US07324943B2 Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
标题翻译：语音标记，语音注释和可选后置处理的便携式设备的语音识别
公开(公告)号：US07324943B2
公开(公告)日：2008-01-29
申请号：US10677174
申请日：2003-10-02
申请人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L21/00 , H04N5/76
CPC分类号： G06F17/30796 , G10L15/26
摘要： A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要翻译：媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上，并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。媒体标签器使用生成的语音识别文本来标记捕获的媒体，并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。在后期处理中，注释可以转换为标签，用于使用字母对声音规则和拼写单词输入来编辑词典，或直接与语音匹配以检索所捕获的媒体。

4. 发明申请

US20050010411A1 Speech data mining for call center management 审中-公开
标题翻译：语音数据挖掘用于呼叫中心管理
公开(公告)号：US20050010411A1
公开(公告)日：2005-01-13
申请号：US10616006
申请日：2003-07-09
申请人： Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L15/26 , G10L17/00 , G10L15/00
CPC分类号： G10L15/26 , G10L17/00
摘要： A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and/or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions. Mined speech data is useful in call center and/or product or service quality management.
摘要翻译：用于产生在呼叫中心管理中具有效用的丰富录音的语音数据挖掘系统包括区分交互式扬声器的语音的语音区分模块和改善一个扬声器的语音的自动识别的语音识别模块，作为参考发言人。转录本生成模块基于扬声器的识别语音生成丰富的录音。专注的交互式语言模型通过使用适合于操作员的语音模型，在高质量频道上从呼叫中心运营商的语音提取的上下文，改善对低质量信道上客户的识别。挖掘的语音数据包括从诸如主题，投诉，解决方案和分辨率的语音识别结果中提取的交互轮廓数量，客户沮丧短语，运营商政治，中断和/或上下文。挖掘的语音数据在呼叫中心和/或产品或服务质量管理中是有用的。

5. 发明申请

US20050228663A1 Media production system using time alignment to scripts 审中-公开
标题翻译：媒体制作系统使用时间对齐脚本
公开(公告)号：US20050228663A1
公开(公告)日：2005-10-13
申请号：US10814960
申请日：2004-03-31
申请人： Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/26
CPC分类号： G10L15/26
摘要： A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.
摘要翻译：媒体制作系统包括文本对准模块，其基于语音识别结果将多个语音记录与脚本的文本行对齐。导航模块通过与用户对应的多个语音记录的线特定部分通信来响应相应于脚本的文本行的用户导航选择。编辑模块通过基于脚本中的文本行的关系与组合记录中的至少一种相结合记录来组合记录中的多个语音记录的行特定部分来累积多个语音记录与文本行的响应，以及时间对齐在多个语音记录和组合记录之间。

6. 发明授权

US6141644A Speaker verification and speaker identification based on eigenvoices 失效
标题翻译：基于特征语音的扬声器验证和扬声器识别
公开(公告)号：US6141644A
公开(公告)日：2000-10-31
申请号：US148911
申请日：1998-09-04
申请人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L15/06 , G10L15/10 , G10L17/00 , G10L15/14
CPC分类号： G10L17/02
摘要： Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.
摘要翻译：语音模型根据已知的客户端扬声器的语音进行构建和训练（并且在演讲人验证的情况下也引用了演讲者）。来自这些模型的参数被连接以定义超级向量，并且这些超向量的线性变换导致维度降低，产生称为本征空间的低维空间。培训演讲者随后被表示为本土空间的分数或分布。此后，来自测试扬声器的新的语音数据通过类似的线性变换被放置到本征空间中，并且测试扬声器的本征空间与训练扬声器的接近度用于认证或识别测试扬声器。

7. 发明授权

US06901364B2 Focused language models for improved speech input of structured documents 有权
标题翻译：用于改进结构化文档语音输入的专注语言模型
公开(公告)号：US06901364B2
公开(公告)日：2005-05-31
申请号：US09951093
申请日：2001-09-13
申请人： Patrick Nguyen , Luca Rigazio , Jean-Claude Junqua
发明人： Patrick Nguyen , Luca Rigazio , Jean-Claude Junqua
IPC分类号： G10L15/18 , G10L15/28 , G10L15/26 , G06F17/20 , G10L21/00
CPC分类号： G10L15/1815 , G10L15/30
摘要： An e-mail message process is provided for use with a personal digital assistant which allows for the use of input speech messaging which is converted to text using a focused language model which is downloaded by a cellular phone connection to an Internet server which provides the focused language model based upon a topic for the intended e-mail message. The text that is generated from the input speech method can be summarized by the e-mail message processor and can be edited by the user. The generated e-mail message can then be transmitted again via cellular connection to an Internet e-mail server for transmitting the e-mail message to a recipient.
摘要翻译：提供电子邮件消息处理以与个人数字助理一起使用，该个人数字助理允许使用输入语音消息传送，其使用由通过蜂窝电话连接下载的聚焦语言模型转换为文本，该互联网服务器提供聚焦基于预期电子邮件的主题的语言模型。从输入语音方法生成的文本可以由电子邮件消息处理器来总结，并且可以由用户编辑。然后可以通过蜂窝连接再次将生成的电子邮件消息发送到Internet电子邮件服务器，以将电子邮件消息发送给接收者。

8. 发明授权

US06895376B2 Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification 有权
标题翻译：用于语音识别，扬声器识别和说话人验证的声学模型的本征语重新估计技术
公开(公告)号：US06895376B2
公开(公告)日：2005-05-17
申请号：US09849174
申请日：2001-05-04
申请人： Florent Perronnin , Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
发明人： Florent Perronnin , Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/06 , G10L17/00
CPC分类号： G10L15/07 , G10L17/02
摘要： A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.
摘要翻译：在训练期间使用减小的维度本征语音分析技术来开发用于异音素的上下文相关的声学模型。执行重新估计过程以更强烈地分离语音模型的与扬声器相关的和与扬声器无关的组件。特定语音技术在运行时也用于新演讲者的演讲。该技术可以消除单个扬声器的特性，从而产生更普遍适用和强大的异音模型。在一个实施例中，本征语音技术用于识别每个说话者的质心，然后可以将其“减去”识别方程。

9. 发明授权

US06263309B1 Maximum likelihood method for finding an adapted speaker model in eigenvoice space 失效
标题翻译：在本征语音空间中找到适应的说话者模型的最大似然法
公开(公告)号：US06263309B1
公开(公告)日：2001-07-17
申请号：US09070054
申请日：1998-04-30
申请人： Patrick Nguyen , Roland Kuhn , Jean-Claude Junqua
发明人： Patrick Nguyen , Roland Kuhn , Jean-Claude Junqua
IPC分类号： G10L1508
CPC分类号： G10L15/07
摘要： A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.
摘要翻译：一组扬声器依赖模型训练在相对较多数量的训练扬声器上，每个扬声器一个模型和模型参数以预定义的顺序提取，以构建一组超级矢量，每个扬声器一个。然后在一组超级矢量上执行原理分量分析，以生成一组定义本征语音空间的特征向量。如果需要，可以减少向量的数量以实现数据压缩。此后，新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。然后，可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数，从该模型参数构建适合于该说话者的适应模型。可以通过在训练数据中包括环境变化来执行环境适应。

10. 发明授权

US07089182B2 Method and apparatus for feature domain joint channel and additive noise compensation 有权
公开(公告)号：US07089182B2
公开(公告)日：2006-08-08
申请号：US10099305
申请日：2002-03-15
申请人： Younes Souilmi , Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua
发明人： Younes Souilmi , Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/20
CPC分类号： G10L15/20 , G10L15/02 , G10L15/063 , G10L21/0216
摘要： A method for performing noise adaptation of a target speech signal input to a speech recognition system, where the target speech signal contains both additive and convolutional noises. The method includes estimating an additive noise bias and a convolutional noise bias; in the target speech signal; and jointly compensating the target speech signal for the additive and convolutional noise biases in a feature domain.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式