专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20110078554A1 WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES 有权
标题翻译：通过对页面结构和结构的联合理解来提取实体实体
公开(公告)号：US20110078554A1
公开(公告)日：2011-03-31
申请号：US12569912
申请日：2009-09-30
申请人： Zaiqing Nie , Yong Cao , Ji-Rong Wen , Chunyu Yang
发明人： Zaiqing Nie , Yong Cao , Ji-Rong Wen , Chunyu Yang
IPC分类号： G06F17/21
CPC分类号： G06F17/278
摘要： Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
摘要翻译：描述了一种用于理解网页的实体的技术，例如标记网页上的实体。迭代和双向框架处理网页，包括向结构理解组件（例如，扩展HCRF模型）提供文本分段特征的文本理解组件（例如，扩展Semi-CRF模型）。结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构（例如，标记块）。文本理解组件依次使用标记块来进一步理解文本。该过程继续迭代直到满足相似性标准，此时实体可以被标记。还描述了使用多个提及网页中的一组文本来帮助标注一个实体。

2. 发明授权

US09092424B2 Webpage entity extraction through joint understanding of page structures and sentences 有权
标题翻译：网页实体提取通过联合理解页面结构和句子
公开(公告)号：US09092424B2
公开(公告)日：2015-07-28
申请号：US12569912
申请日：2009-09-30
申请人： Zaiqing Nie , Yong Cao , Ji-Rong Wen , Chunyu Yang
发明人： Zaiqing Nie , Yong Cao , Ji-Rong Wen , Chunyu Yang
IPC分类号： G06F17/00 , G06F17/27
CPC分类号： G06F17/278
摘要： Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
摘要翻译：描述了一种用于理解网页的实体的技术，例如标记网页上的实体。迭代和双向框架处理网页，包括向结构理解组件（例如，扩展HCRF模型）提供文本分段特征的文本理解组件（例如，扩展Semi-CRF模型）。结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构（例如，标记块）。文本理解组件依次使用标记块来进一步理解文本。该过程继续迭代直到满足相似性标准，此时实体可以被标记。还描述了使用多个提及网页中的一组文本来帮助标注一个实体。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式