会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 9. 发明授权
    • Webpage entity extraction through joint understanding of page structures and sentences
    • 网页实体提取通过联合理解页面结构和句子
    • US09092424B2
    • 2015-07-28
    • US12569912
    • 2009-09-30
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • G06F17/00G06F17/27
    • G06F17/278
    • Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
    • 描述了一种用于理解网页的实体的技术,例如标记网页上的实体。 迭代和双向框架处理网页,包括向结构理解组件(例如,扩展HCRF模型)提供文本分段特征的文本理解组件(例如,扩展Semi-CRF模型)。 结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构(例如,标记块)。 文本理解组件依次使用标记块来进一步理解文本。 该过程继续迭代直到满足相似性标准,此时实体可以被标记。 还描述了使用多个提及网页中的一组文本来帮助标注一个实体。