专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07168953B1 Trainable videorealistic speech animation 有权
标题翻译：可培训的视频讲话动画
公开(公告)号：US07168953B1
公开(公告)日：2007-01-30
申请号：US10352319
申请日：2003-01-27
申请人： Tomaso A. Poggio , Antoine F. Ezzat
发明人： Tomaso A. Poggio , Antoine F. Ezzat
IPC分类号： G09B19/04
CPC分类号： G06T13/205 , G06T13/40 , G10L2021/105
摘要： A method and apparatus for videorealistic, speech animation is disclosed. A human subject is recorded using a video camera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. The two key components of this invention are 1) a multidimensional morphable model (MMM) to synthesize new, previously unseen mouth configurations from a small set of mouth image prototypes; and 2) a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance.
摘要翻译：公开了一种用于录影动画的方法和装置。当他/她发出预定的语音语料库时，使用摄像机记录人类对象。在自动处理语料后，从能够合成人类受试者口中发出完全没有记录在原始视频中的新颖话语的数据中学习视觉语音模块。将合成的话语重新合成到包含自然头部和眼睛运动的背景序列上。在最后的输出是视觉感觉，它看起来像摄像机记录的主题。本发明的两个关键组成部分是1）多维变形模型（MMM），用于从一小组嘴形图原型合成新的以前未见的口腔构型; 和2）基于正则化的轨迹合成技术，其由记录的视频语料库自动训练，并且能够合成对应于任何所需话语的MMM空间中的轨迹。

2. 发明授权

US06250928B1 Talking facial display method and apparatus 有权
标题翻译：谈话面部显示方法和装置
公开(公告)号：US06250928B1
公开(公告)日：2001-06-26
申请号：US09223858
申请日：1998-12-31
申请人： Tomaso A. Poggio , Antoine F. Ezzat
发明人： Tomaso A. Poggio , Antoine F. Ezzat
IPC分类号： G09B1904
CPC分类号： G09B19/04
摘要： A method and apparatus of converting input text into an audio-visual speech stream resulting in a talking face image enunciating the text. This method of converting input text into an audio-visual speech stream comprises the steps of: recording a visual corpus of a human-subject, building a viseme interpolation database, and synchronizing the talking face image with the text stream. In a preferred embodiment, viseme transitions are automatically calculated using optical flow methods, and morphing techniques are employed to result in smooth viseme transitions. The viseme transitions are concatenated together and synchronized with the phonemes according to the timing information. The audio-visual speech stream is then displayed in real time, thereby displaying a photo-realistic talking face.
摘要翻译：一种将输入文本转换成视听语音流的方法和装置，导致说出文本的说话面部图像。这种将输入文本转换成视听语音流的方法包括以下步骤：记录人类对象的视觉语料库，构建视觉插值数据库，以及将谈话的脸部图像与文本流同步。在优选的实施方案中，使用光学流动方法自动计算视标跃迁，并且采用变形技术以产生平滑的视觉转换。视觉转换被连接在一起，并根据时间信息与音素同步。然后，实时地显示视听语音流，从而显示照片般逼真的通话面孔。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式