会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 12. 发明申请
    • PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH
    • 具有与语音同步的特征的三维动画的照片 - 现实综合
    • US20120280974A1
    • 2012-11-08
    • US13099387
    • 2011-05-03
    • Lijuan WangFrank SoongQiang HuoZhengyou Zhang
    • Lijuan WangFrank SoongQiang HuoZhengyou Zhang
    • G06T13/40G06T15/00
    • G06T13/40G10L21/10G10L2021/105
    • Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.
    • 动态纹理映射用于创建具有与期望语音同步的面部特征的个体的逼真的三维动画。 读取已知脚本的个人的视听数据被获取并存储在音频库和图像库中。 处理视听数据以提取用于训练统计模型的特征向量。 提供对应于动画将被同步的期望语音的输入音频特征向量。 统计模型用于生成对应于输入音频特征向量的视觉特征向量的轨迹。 这些视觉特征向量用于识别来自图像库的匹配图像序列。 从图像库连接的所得到的图像序列提供具有与所需语音同步的面部特征(例如唇部移动)的照片写实图像序列。 该图像序列应用于三维模型。
    • 15. 发明申请
    • Unnatural prosody detection in speech synthesis
    • 语言合成中的非自然韵律检测
    • US20090083036A1
    • 2009-03-26
    • US11903020
    • 2007-09-20
    • Yong ZhaoFrank Kao-ping SoongMin ChuLijuan Wang
    • Yong ZhaoFrank Kao-ping SoongMin ChuLijuan Wang
    • G10L13/08G06F17/30
    • G10L13/10
    • Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.
    • 描述了一种技术,通过该技术,从文本产生的合成语音针对韵律模型(离线训练)进行评估,以确定语音是否会听起来不自然。 如果是,则使用修改的数据重新生成语音。 评估和再生可能是迭代的,直到被认为是自然的声音。 例如,文本被内置到一个格子中,然后(例如,维特比)被搜索以找到最佳路径。 通过韵律模型评估路径上的数据的部分(例如,单位)。 如果评估认为一部分对应于非自然韵律,则该部分被替换,例如通过修改/修剪格子并重新执行搜索。 替换可能是迭代的,直到所有部分通过评估。 不自然的韵律检测可能有偏差,使得在评估期间,相对于错过非自然韵律的速率,以较高的速率错误地检测到非自然韵律。