专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US10909329B2 Multilingual image question answering 有权
公开(公告)号：US10909329B2
公开(公告)日：2021-02-02
申请号：US15137179
申请日：2016-04-25
申请人： Baidu USA, LLC
发明人： Haoyuan Gao , Junhua Mao , Jie Zhou , Zhiheng Huang , Lei Wang , Wei Xu
IPC分类号： G06F40/56 , G06N3/04 , G06N5/04
摘要： Embodiments of a multimodal question answering (mQA) system are presented to answer a question about the content of an image. In embodiments, the model comprises four components: a Long Short-Term Memory (LSTM) component to extract the question representation; a Convolutional Neural Network (CNN) component to extract the visual representation; an LSTM component for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. A Freestyle Multilingual Image Question Answering (FM-IQA) dataset was constructed to train and evaluate embodiments of the mQA model. The quality of the generated answers of the mQA model on this dataset is evaluated by human judges through a Turing Test.

2. 发明申请

US20170098153A1 INTELLIGENT IMAGE CAPTIONING 有权
公开(公告)号：US20170098153A1
公开(公告)日：2017-04-06
申请号：US15166177
申请日：2016-05-26
申请人： Baidu USA, LLC
发明人： Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang
IPC分类号： G06N3/04 , G06N3/08
摘要： Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions.

3. 发明授权

US11593612B2 Intelligent image captioning 有权
公开(公告)号：US11593612B2
公开(公告)日：2023-02-28
申请号：US16544772
申请日：2019-08-19
申请人： BAIDU USA LLC
发明人： Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang
IPC分类号： G06N3/04
摘要： Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions.

4. 发明授权

US10504010B2 Systems and methods for fast novel visual concept learning from sentence descriptions of images 有权
公开(公告)号：US10504010B2
公开(公告)日：2019-12-10
申请号：US15418401
申请日：2017-01-27
申请人： Baidu USA, LLC
发明人： Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang
IPC分类号： G06K9/72 , G06N3/04 , G06K9/62 , G06F17/27
CPC分类号： G06K9/726 , G06K9/627 , G06N3/0445 , G06N3/0454 , G06F17/2785
摘要： Described herein are systems and methods that address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, embodiments are able to efficiently hypothesize the semantic meaning of new words and add them to model word dictionaries so that they can be used to describe images which contain these novel concepts. In the experiments, it was shown that the tested embodiments effectively learned novel visual concepts from a few examples without disturbing the previously learned concepts.

5. 发明申请

US20170127016A1 SYSTEMS AND METHODS FOR VIDEO PARAGRAPH CAPTIONING USING HIERARCHICAL RECURRENT NEURAL NETWORKS 审中-公开
公开(公告)号：US20170127016A1
公开(公告)日：2017-05-04
申请号：US15183678
申请日：2016-06-15
申请人： Baidu USA, LLC
发明人： Haonan Yu , Jiang Wang , Zhiheng Huang , Yi Yang , Wei Xu
IPC分类号： H04N7/035 , G06F17/21
CPC分类号： G06K9/00711 , G06N3/0445 , G06N3/084
摘要： Described herein are systems and methods that exploit hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem; that is, generating one or multiple sentences to describe a realistic video. Embodiments of the hierarchical framework comprise a sentence generator and a paragraph generator. In embodiments, the sentence generator produces one simple short sentence that describes a specific short video interval. In embodiments, it exploits both temporal- and spatial-attention mechanisms to selectively focus on visual elements during generation. In embodiments, the paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial state for the sentence generator.

6. 发明申请

US20160342895A1 MULTILINGUAL IMAGE QUESTION ANSWERING 审中-公开
标题翻译：多重图像问题解答
公开(公告)号：US20160342895A1
公开(公告)日：2016-11-24
申请号：US15137179
申请日：2016-04-25
申请人： Baidu USA, LLC
发明人： Haoyuan Gao , Junhua Mao , Jie Zhou , Zhiheng Huang , Lei Wang , Wei Xu
IPC分类号： G06N5/02 , G06F17/27
CPC分类号： G06F17/2881 , G06N3/0445 , G06N3/0454 , G06N5/04
摘要： Embodiments of a multimodal question answering (mQA) system are presented to answer a question about the content of an image. In embodiments, the model comprises four components: a Long Short-Term Memory (LSTM) component to extract the question representation; a Convolutional Neural Network (CNN) component to extract the visual representation; an LSTM component for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. A Freestyle Multilingual Image Question Answering (FM-IQA) dataset was constructed to train and evaluate embodiments of the mQA model. The quality of the generated answers of the mQA model on this dataset is evaluated by human judges through a Turing Test.
摘要翻译：呈现多模式问答（mqA）系统的实施例以回答关于图像内容的问题。在实施例中，模型包括四个组件：提取问题表示的长短期存储器（LSTM）组件; 卷积神经网络（CNN）组件提取视觉表示; 用于将语言上下文存储在答案中的LSTM组件和用于组合来自前三个组件的信息并产生答案的定影组件。构建自由式多语言图像问题回答（FM-IQA）数据集，以训练和评估mQA模型的实施方案。人类法官通过图灵测试评估了该数据集上mQA模型生成的答案的质量。

7. 发明授权

US10423874B2 Intelligent image captioning 有权
公开(公告)号：US10423874B2
公开(公告)日：2019-09-24
申请号：US15166177
申请日：2016-05-26
申请人： Baidu USA, LLC
发明人： Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang
IPC分类号： G06N3/04
摘要： Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions.

8. 发明授权

US10395118B2 Systems and methods for video paragraph captioning using hierarchical recurrent neural networks 有权
公开(公告)号：US10395118B2
公开(公告)日：2019-08-27
申请号：US15183678
申请日：2016-06-15
申请人： Baidu USA, LLC
发明人： Haonan Yu , Jiang Wang , Zhiheng Huang , Yi Yang , Wei Xu
IPC分类号： G06N3/04 , G06K9/00 , G06N3/08
摘要： Described herein are systems and methods that exploit hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem; that is, generating one or multiple sentences to describe a realistic video. Embodiments of the hierarchical framework comprise a sentence generator and a paragraph generator. In embodiments, the sentence generator produces one simple short sentence that describes a specific short video interval. In embodiments, it exploits both temporal- and spatial-attention mechanisms to selectively focus on visual elements during generation. In embodiments, the paragraph generator captures the inter-sentence dependency by taking as input the sentential embedding produced by the sentence generator, combining it with the paragraph history, and outputting the new initial state for the sentence generator.

9. 发明申请

US20170147910A1 SYSTEMS AND METHODS FOR FAST NOVEL VISUAL CONCEPT LEARNING FROM SENTENCE DESCRIPTIONS OF IMAGES 审中-公开
公开(公告)号：US20170147910A1
公开(公告)日：2017-05-25
申请号：US15418401
申请日：2017-01-27
申请人： Baidu USA, LLC
发明人： Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang
IPC分类号： G06K9/72 , G06N3/08 , G06K9/62 , G06N3/04
CPC分类号： G06K9/726 , G06F17/2785 , G06K9/627 , G06N3/0445 , G06N3/0454
摘要： Described herein are systems and methods that address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, embodiments are able to efficiently hypothesize the semantic meaning of new words and add them to model word dictionaries so that they can be used to describe images which contain these novel concepts. In the experiments, it was shown that the tested embodiments effectively learned novel visual concepts from a few examples without disturbing the previously learned concepts.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式