专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

11. 发明申请

US20050228663A1 Media production system using time alignment to scripts 审中-公开
标题翻译：媒体制作系统使用时间对齐脚本
公开(公告)号：US20050228663A1
公开(公告)日：2005-10-13
申请号：US10814960
申请日：2004-03-31
申请人： Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/26
CPC分类号： G10L15/26
摘要： A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.
摘要翻译：媒体制作系统包括文本对准模块，其基于语音识别结果将多个语音记录与脚本的文本行对齐。导航模块通过与用户对应的多个语音记录的线特定部分通信来响应相应于脚本的文本行的用户导航选择。编辑模块通过基于脚本中的文本行的关系与组合记录中的至少一种相结合记录来组合记录中的多个语音记录的行特定部分来累积多个语音记录与文本行的响应，以及时间对齐在多个语音记录和组合记录之间。

12. 发明授权

US06272462B1 Supervised adaptation using corrective N-best decoding 失效
标题翻译：使用校正N最佳解码的监督适应
公开(公告)号：US06272462B1
公开(公告)日：2001-08-07
申请号：US09257893
申请日：1999-02-25
申请人： Patrick Nguyen , Philippe Gelin , Jean-Claude Junqua
发明人： Patrick Nguyen , Philippe Gelin , Jean-Claude Junqua
IPC分类号： G10L1506
CPC分类号： G10L15/075 , G10L2015/0635
摘要： Supervised adaptation speech is supplied to the recognizer and the recognizer generates the N-best transcriptions of the adaptation speech. These transcriptions include the one transcription known to be correct, based on a priori knowledge of the adaptation speech, and the remaining transcriptions known to be incorrect. The system applies weights to each transcription: a positive weight to the correct transcription and negative weights to the incorrect transcriptions. These weights have the effect of moving the incorrect transcriptions away from the correct one, rendering the recognition system more discriminative for the new speaker's speaking characteristics. Weights applied to the incorrect solutions are based on the respective likelihood scores generated by the recognizer. The sum of all weights (positive and negative) are a positive number. This ensures that the system will converge.
摘要翻译：受监督的适应语音被提供给识别器，并且识别器生成适应语音的N个最佳的转录。这些转录包括基于适应言语的先验知识的已知正确的一个转录，以及已知不正确的剩余转录。该系统对每个转录应用权重：对正确转录的正负重和不正确转录的负权重。这些权重具有将错误的记录从正确的转录中移开的效果，使识别系统对于新的说话者的说话特征更具歧视性。应用于不正确解的权重是基于识别器产生的各自的可能性得分。所有权重（正和负）的和是正数。这样可以确保系统收敛。

13. 发明授权

US06697778B1 Speaker verification and speaker identification based on a priori knowledge 有权
标题翻译：基于先验知识的扬声器验证和扬声器识别
公开(公告)号：US06697778B1
公开(公告)日：2004-02-24
申请号：US09610495
申请日：2000-07-05
申请人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L1506
CPC分类号： G10L17/02
摘要： Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
摘要翻译：扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器，或者由训练和客户端扬声器组合构成。还提供了基于客户端环境信息对扬声器空间的再估计，以提高客户端数据落入扬声器空间的可能性。在将客户登记到扬声器空间中，当满足预定条件时，可以获得额外的客户端语音。扬声器分配也可以在客户端注册步骤中使用。

14. 发明授权

US06343267B1 Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques 有权
标题翻译：使用本征语音技术的扬声器归一化和扬声器和环境适应的尺寸减小
公开(公告)号：US06343267B1
公开(公告)日：2002-01-29
申请号：US09148753
申请日：1998-09-04
申请人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
发明人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L1908
CPC分类号： G06K9/6247 , G10L15/07
摘要： A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.
摘要翻译：一组扬声器依赖模型或适应模型被训练在相对较多数量的训练扬声器上，每个扬声器一个模型和模型参数以预定义的顺序被提取以构造一组超级矢量，每个扬声器一个。然后对该一组超级矢量执行尺寸减小，以生成一组定义本征语音空间的特征向量。如果需要，可以减少向量的数量以实现数据压缩。此后，新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。然后，可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数，从该模型参数构建适合于该说话者的适应模型。然后可以通过MAP，MLLR，MLED等进一步适配适配模型。本征语音技术可以应用于MLLR变换矩阵等; 在本体空间中执行的贝叶斯估计使用关于扬声器空间密度的先前知识来改进关于本征空间中新的说话者位置的估计。

15. 发明授权

US06205426B1 Unsupervised speech model adaptation using reliable information among N-best strings 失效
标题翻译：无人监督的语音模型适应使用N最佳字符串中的可靠信息
公开(公告)号：US06205426B1
公开(公告)日：2001-03-20
申请号：US09237170
申请日：1999-01-25
申请人： Patrick Nguyen , Philippe Gelin , Jean-Claude Junqua
发明人： Patrick Nguyen , Philippe Gelin , Jean-Claude Junqua
IPC分类号： G10L1514
CPC分类号： G10L15/065
摘要： The system performs unsupervised speech model adaptation using the recognizer to generate the N-best solutions for an input utterance. Each of these N-best solutions is tested by a reliable information extraction process. Reliable information is extracted by a weighting technique based on likelihood scores generated by the recognizer, or by a non-linear thresholding function. The system may be used in a single pass implementation or iteratively in a multi-pass implementation.
摘要翻译：该系统使用识别器执行无监督的语音模型自适应，以产生用于输入语音的N最佳解。这些N最佳解决方案中的每一个都通过可靠的信息提取过程进行测试。通过基于由识别器生成的似然分数的加权技术或非线性阈值函数来提取可靠信息。该系统可以在单遍实现中或在多遍实现中迭代地使用。

16. 发明授权

US06970820B2 Voice personalization of speech synthesizer 有权
标题翻译：语音合成器的语音个性化
公开(公告)号：US06970820B2
公开(公告)日：2005-11-29
申请号：US09792928
申请日：2001-02-26
申请人： Jean-Claude Junqua , Florent Perronnin , Roland Kuhn , Patrick Nguyen
发明人： Jean-Claude Junqua , Florent Perronnin , Roland Kuhn , Patrick Nguyen
IPC分类号： G10L13/08 , G10L13/02 , G10L13/04 , G10L13/06 , G10L21/00 , G10L13/00
CPC分类号： G10L13/04 , G10L2021/0135
摘要： The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
摘要翻译：语音合成器被个性化以发音或模仿单个扬声器的语音特征。单个扬声器提供一定数量的登记数据，其可以从短语言中提取，并且系统将基本合成参数修改为更接近于新说话者的参考数据。更具体地，合成参数可以被分解为与扬声器相关的参数，诸如与上下文无关的参数，以及与扬声器无关的参数，诸如与上下文相关的参数。使用来自新扬声器的注册数据来调整与扬声器相关的参数。在适应之后，将扬声器依赖参数与扬声器独立参数组合以提供一组个性化合成参数。为了使参数具有少量的注册数据，构造本征空间并用于约束新的说话者的位置，以便可以估计不能由新发言者提供的上下文独立参数。

17. 发明授权

US6141644A Speaker verification and speaker identification based on eigenvoices 失效
标题翻译：基于特征语音的扬声器验证和扬声器识别
公开(公告)号：US6141644A
公开(公告)日：2000-10-31
申请号：US148911
申请日：1998-09-04
申请人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Roland Kuhn , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L15/06 , G10L15/10 , G10L17/00 , G10L15/14
CPC分类号： G10L17/02
摘要： Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.
摘要翻译：语音模型根据已知的客户端扬声器的语音进行构建和训练（并且在演讲人验证的情况下也引用了演讲者）。来自这些模型的参数被连接以定义超级向量，并且这些超向量的线性变换导致维度降低，产生称为本征空间的低维空间。培训演讲者随后被表示为本土空间的分数或分布。此后，来自测试扬声器的新的语音数据通过类似的线性变换被放置到本征空间中，并且测试扬声器的本征空间与训练扬声器的接近度用于认证或识别测试扬声器。

18. 发明申请

US20050038655A1 Bubble splitting for compact acoustic modeling 有权
标题翻译：气泡分裂用于紧凑的声学建模
公开(公告)号：US20050038655A1
公开(公告)日：2005-02-17
申请号：US10639974
申请日：2003-08-13
申请人： Ambroise Mutel , Patrick Nguyen , Luca Rigazio
发明人： Ambroise Mutel , Patrick Nguyen , Luca Rigazio
IPC分类号： G10L11/00 , G10L15/02 , G10L15/06 , G10L15/14
CPC分类号： G10L15/063 , G10L15/144 , G10L2015/0631 , G10L2015/0638
摘要： An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.
摘要翻译：提供了一种用于构建用于语音识别器中的紧凑声学模型的改进方法。该方法包括：根据至少一个语音相关标准（即，声道长度）来分割来自多个训练说话者的语音数据; 将具有类似语音特征的训练说话者的分割语音数据分组在一起; 并使用组内的语音数据为每个组训练声音气泡模型。

19. 发明授权

US07328154B2 Bubble splitting for compact acoustic modeling 有权
标题翻译：气泡分裂用于紧凑的声学建模
公开(公告)号：US07328154B2
公开(公告)日：2008-02-05
申请号：US10639974
申请日：2003-08-13
申请人： Ambroise Mutel , Patrick Nguyen , Luca Rigazio
发明人： Ambroise Mutel , Patrick Nguyen , Luca Rigazio
IPC分类号： G10L15/00
CPC分类号： G10L15/063 , G10L15/144 , G10L2015/0631 , G10L2015/0638
摘要： An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.
摘要翻译：提供了一种用于构建用于语音识别器中的紧凑声学模型的改进方法。该方法包括：根据至少一个语音相关标准（即，声道长度）来分割来自多个训练说话者的语音数据; 将具有类似语音特征的训练说话者的分割语音数据分组在一起; 并使用组内的语音数据为每个组训练声音气泡模型。

20. 发明授权

US06879954B2 Pattern matching for large vocabulary speech recognition systems 有权
标题翻译：大词汇语音识别系统的模式匹配
公开(公告)号：US06879954B2
公开(公告)日：2005-04-12
申请号：US10127184
申请日：2002-04-22
申请人： Patrick Nguyen , Luca Rigazio
发明人： Patrick Nguyen , Luca Rigazio
IPC分类号： G10L15/08 , G10L15/10 , G10L15/28 , G10L15/00 , G06F15/76
CPC分类号： G10L15/08 , G10L15/10 , G10L15/285 , G10L15/30 , G10L15/34
摘要： A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models. The improved method includes: receiving continuous speech input; generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input; loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor; loading an acoustic model from the plurality of acoustic models into the memory workspace; and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model. Prior to retrieving another group of acoustic feature vectors, similarity measures are computed for the first group of acoustic feature vectors in relation to each of the acoustic models employed by the speech recognition system. In this way, the improved method reduces the number I/O operations associated with loading and unloading each acoustic model into memory.
摘要翻译：提供了一种用于改进具有多个声学模型的语音识别系统中的模式匹配的方法。改进的方法包括：接收连续语音输入; 产生表示语音输入的时间和频谱行为的声学特征向量序列; 将来自声学特征向量序列的第一组声学特征向量加载到可由处理器访问的存储器工作空间; 将来自所述多个声学模型的声学模型加载到所述存储器工作空间中; 以及针对声学模型确定第一组声学特征向量的每个声学特征向量的相似性度量。在检索另一组声学特征向量之前，相对于由语音识别系统采用的每个声学模型，针对第一组声学特征向量计算相似性度量。以这种方式，改进的方法减少了将每个声学模型加载和卸载到存储器中的数量I / O操作。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式