会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 6. 发明授权
    • Method and system for aligning natural and synthetic video to speech synthesis
    • 将自然和合成视频与语音合成对齐的方法和系统
    • US07844463B2
    • 2010-11-30
    • US12193397
    • 2008-08-18
    • Andrea BassoMark Charles BeutnagelJoern Ostermann
    • Andrea BassoMark Charles BeutnagelJoern Ostermann
    • G10L13/00G06T13/00
    • G06T9/001G10L13/00G10L21/06H04N21/2368H04N21/4341
    • According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text and Facial Animation Parameters. A Text-To-Speech converter drives the mouth shapes of the face. An encoder sends Facial Animation Parameters to the face. The text input can include codes, or bookmarks, transmitted to the Text-to-Speech converter, which are placed between and inside words. The bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. The Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp and a real-time time stamp. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
    • 根据MPEG-4的TTS架构,面部动画可以同时由两个流驱动 - 文本和面部动画参数。 文字转语音转换器驱动脸部的嘴形。 编码器将面部动画参数发送到脸部。 文本输入可以包括发送到文本到语音转换器的代码或书签,其被放置在内部和内部的单词之间。 书签带有编码器时间戳。 由于文本到语音转换的性质,编码器时间戳与实际时间无关,应被解释为计数器。 面部动画参数流携带与文本书签相同的编码器时间戳。 系统读取书签并提供编码器时间戳和实时时间戳。 面部动画系统使用书签的编码器时间戳作为参考,将正确的面部动画参数与实时时间戳相关联。
    • 7. 发明授权
    • Synthetic audiovisual description scheme, method and system for MPEG-7
    • MPEG-7的综合视听描述方案,方法和系统
    • US07673239B2
    • 2010-03-02
    • US10609737
    • 2003-06-30
    • Qian HuangJoern OstermannAtul PuriRaj Kumar Rajendran
    • Qian HuangJoern OstermannAtul PuriRaj Kumar Rajendran
    • G06T15/00
    • G06F17/30017G06F17/30858G06F17/30864H04N1/00209
    • A method and system for description of synthetic audiovisual content makes it easier for humans, software components or devices to identify, manage, categorize, search, browse and retrieve such content. For instance, a user may wish to search for specific synthetic audiovisual objects in digital libraries, Internet web sites or broadcast media; such a search is enabled by the invention. Key characteristics of synthetic audiovisual content itself such as the underlying 2d or 3d models and parameters for animation of these models are used to describe it. More precisely, to represent features of synthetic audiovisual content, depending on the description scheme to be used, a number of descriptors are selected and assigned values. The description scheme instantiated with descriptor values is used to generate the description, which is then stored for actual use during query/search. Typically, a user, to search for a needed synthetic audiovisual content initiates a query that is passed on to a search engine that then retrieves the candidate content from one or more databases whose description closely matches the query criteria specified by the user.
    • 用于描述合成视听内容的方法和系统使人,软件组件或设备更容易识别,管理,分类,搜索,浏览和检索这些内容。 例如,用户可能希望在数字图书馆,互联网网站或广播媒体中搜索特定的合成音像对象; 通过本发明实现了这种搜索。 合成视听内容本身的关键特征,例如基本的2d或3d模型和这些模型的动画参数用于描述。 更精确地,为了表示合成视听内容的特征,根据要使用的描述方案,选择多个描述符并分配值。 用描述符值实例化的描述方案用于生成描述,然后在查询/搜索期间将其描述为实际使用。 通常,用户搜索所需的合成视听内容启动传递到搜索引擎的查询,该搜索引擎然后从其描述与用户指定的查询标准紧密匹配的一个或多个数据库中检索候选内容。
    • 8. 发明授权
    • System for low-latency animation of talking heads
    • 讲话头低延迟动画系统
    • US07627478B2
    • 2009-12-01
    • US11778228
    • 2007-07-16
    • Eric CosattoHans Peter GrafJoern Ostermann
    • Eric CosattoHans Peter GrafJoern Ostermann
    • G10L21/00
    • G06F17/30905
    • Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response. If the client cache has no stored data associated with the talking head response, the method comprises transmitting the talking head response to be rendered on the client device.
    • 公开了一种用于在客户端设备上呈现通话头的方法和设备。 客户端设备具有能够存储与呈现话音头相关联的音频/视频数据的客户端高速缓存。 该方法包括将客户端设备的客户端缓存中的句子存储在与对话中的桥接延迟相关联,存储要在对话中使用的语句模板,从客户端设备生成对用户的询问头响应,以及确定句子或 存储在客户端缓存中的存储模板涉及到通话头响应。 如果存储的句子或存储的模板与谈话头响应相关,则该方法包括指示客户端设备使用来自客户端高速缓存的适当存储的句子或模板来呈现至少一部分通话头响应并且传送一部分 说话头响应没有存储在客户端缓存中(如果有的话)给客户端设备呈现完整的通话头响应。 如果客户端缓存没有与通话头响应相关联的存储数据,则该方法包括传送要在客户端设备上呈现的通话头响应。
    • 10. 发明授权
    • Method and system for aligning natural and synthetic video to speech synthesis
    • 将自然和合成视频与语音合成对齐的方法和系统
    • US07366670B1
    • 2008-04-29
    • US11464018
    • 2006-08-11
    • Andrea BassoMark Charles BeutnagelJoern Ostermann
    • Andrea BassoMark Charles BeutnagelJoern Ostermann
    • G10L13/00G06T13/00
    • G06T9/001G10L13/00G10L21/06H04N21/2368H04N21/4341
    • Facial animation in MPEG-4 can be driven by a text stream and a Facial Animation Parameters (FAP) stream. Text input is sent to a TTS converter that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Disclosed are codes bookmarks in the text string transmitted to the TTS converter. Bookmarks are placed between and inside words and carry an encoder time stamp. The encoder time stamp does not relate to real-world time. The FAP stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
    • MPEG-4中的面部动画可以由文本流和面部动画参数(FAP)流驱动。 文本输入被发送到驱动脸部嘴形的TTS转换器。 FAP通过通信通道从编码器发送到脸部。 公开了传输到TTS转换器的文本串中的代码书签。 书签放置在内部和内部,并携带编码器时间戳。 编码器时间戳与实际时间无关。 FAP流携带与文本书签相同的编码器时间戳。 系统读取书签,并向面部动画系统提供编码器时间戳以及实时时间戳。 面部动画系统使用书签的编码器时间戳作为参考,将正确的面部动画参数与实时时间戳相关联。