会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • Visual and interactive wrapper generation, automated information extraction from web pages, and translation into xml
    • 视觉和交互式包装生成,从网页自动提取信息,并翻译成xml
    • US20050022115A1
    • 2005-01-27
    • US10479039
    • 2002-05-28
    • Roberts BaumgartnerSergio I'LescaGeorg GottlobMarcus Herzoo
    • Roberts BaumgartnerSergio I'LescaGeorg GottlobMarcus Herzoo
    • G06F15/00G06F17/30
    • G06F17/30911G06F17/30867
    • A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
    • 一种使用HTML等标记语言格式化的网页进行信息提取的方法和系统[8]。 基于可视化样本网页的交互式和视觉描述感兴趣信息模式的方法和系统[5,6,16-29]。 一种用于表示和存储这些模式的方法和数据结构[1]。 描述了一种从Web页面提取与一组先前定义的模式相关的信息的方法和系统,以及一种将所提取的数据变换为XML的方法。 每个模式通过一个或多个过滤器的(交互式)规范定义。 相同模式的两个或更多个过滤器对模式定义[3]有分歧,即实际模式描述了其任何过滤器指定的所有目标集。 一种用于通过在输入网页上解释并执行上述形式的先前定义的包装程序来从网页中提取相关元素的方法,并且以合适的数据结构的形式产生提取的元素作为输出。 描述了通过利用模式的层次结构以及通过使用模式名作为XML标签将所述输出自动地翻译成XML格式的方法和系统。
    • 2. 发明授权
    • Visual and interactive wrapper generation, automated information extraction from Web pages, and translation into XML
    • 视觉和交互式包装器生成,从Web页面自动提取信息,并翻译成XML
    • US07581170B2
    • 2009-08-25
    • US10479039
    • 2002-05-28
    • Robert BaumgartnerSergio I'LescaGeorg GottlobMarcus Herzoo
    • Robert BaumgartnerSergio I'LescaGeorg GottlobMarcus Herzoo
    • G06N3/00
    • G06F17/30911G06F17/30867
    • A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
    • 一种使用HTML等标记语言格式化的网页进行信息提取的方法和系统[8]。 基于可视化样本网页的交互式和视觉描述感兴趣信息模式的方法和系统[5,6,16-29]。 一种用于表示和存储这些模式的方法和数据结构[1]。 描述了一种从Web页面提取与一组先前定义的模式相关的信息的方法和系统,以及一种将所提取的数据变换为XML的方法。 每个模式通过一个或多个过滤器的(交互式)规范定义。 相同模式的两个或更多个过滤器对模式定义[3]有分歧,即实际模式描述了其任何过滤器指定的所有目标集。 一种用于通过在输入网页上解释并执行上述形式的先前定义的包装程序来从网页中提取相关元素的方法,并且以合适的数据结构的形式产生提取的元素作为输出。 描述了通过利用模式的层次结构以及通过使用模式名作为XML标签将所述输出自动地翻译成XML格式的方法和系统。