会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明申请
    • Data Extraction Method, Computer Program Product and System
    • 数据提取方法,计算机程序产品与系统
    • US20120059859A1
    • 2012-03-08
    • US13258480
    • 2009-11-25
    • Li-Mei JiaoYuhong Xiong
    • Li-Mei JiaoYuhong Xiong
    • G06F17/30
    • G06F17/30896
    • Disclosed is a method of automatically extracting data from a target web page, comprising selecting (302) data in a source web page; determining (304) the respective DOM (document object model) trees of the source and target web page, and identifying the one or more nodes comprising the selected data in the source web page DOM tree; determining (306) matching paths in the respective DOM trees; for selected data in a node of an unmatched branch of the source web page DOM tree, identifying (308) the nearest matched path in the source web page; identifying (310) the unmatched branch nearest to the corresponding matched path in the target web page; determining (312) if said identified unmatched branch in the target web page DOM tree comprises a target node matching the selected data node; and if so: extracting (322) data from the target node if the mismatch between the respective unmatched branches does not exceed a predefined threshold. A computer program product and system implementing this method are also disclosed.
    • 公开了一种从目标网页自动提取数据的方法,包括在源网页中选择(302)数据; 确定(304)源和目标网页的相应DOM(文档对象模型)树,以及在源网页DOM树中标识包括所选数据的一个或多个节点; 确定(306)相应DOM树中的匹配路径; 对于源网页DOM树的不匹配分支的节点中的选定数据,识别(308)源网页中最近的匹配路径; 识别(310)最接近目标网页中相应匹配路径的不匹配分支; 确定(312)如果所述目标网页DOM树中的所述识别的不匹配分支包括与所选数据节点匹配的目标节点; 如果是:如果各个不匹配的分支之间的不匹配没有超过预定义的阈值,则从目标节点提取(322)数据。 还公开了一种实现该方法的计算机程序产品和系统。
    • 6. 发明授权
    • Data extraction method, computer program product and system
    • 数据提取方法,计算机程序产品和系统
    • US08667015B2
    • 2014-03-04
    • US13258480
    • 2009-11-25
    • Li-Mei JiaoYuhong Xiong
    • Li-Mei JiaoYuhong Xiong
    • G06F17/30
    • G06F17/30896
    • Disclosed is a method of automatically extracting data from a target web page, comprising selecting (302) data in a source web page; determining (304) the respective DOM (document object model) trees of the source and target web page, and identifying the one or more nodes comprising the selected data in the source web page DOM tree; determining (306) matching paths in the respective DOM trees; for selected data in a node of an unmatched branch of the source web page DOM tree, identifying (308) the nearest matched path in the source web page; identifying (310) the unmatched branch nearest to the corresponding matched path in the target web page; determining (312) if said identified unmatched branch in the target web page DOM tree comprises a target node matching the selected data node; and if so: extracting (322) data from the target node if the mismatch between the respective unmatched branches does not exceed a predefined threshold. A computer program product and system implementing this method are also disclosed.
    • 公开了一种从目标网页自动提取数据的方法,包括在源网页中选择(302)数据; 确定(304)源和目标网页的相应DOM(文档对象模型)树,以及在源网页DOM树中标识包括所选数据的一个或多个节点; 确定(306)相应DOM树中的匹配路径; 对于源网页DOM树的不匹配分支的节点中的选定数据,识别(308)源网页中最近的匹配路径; 识别(310)最接近目标网页中相应匹配路径的不匹配分支; 确定(312)如果所述目标网页DOM树中的所述识别的不匹配分支包括与所选数据节点匹配的目标节点; 如果是:如果各个不匹配的分支之间的不匹配没有超过预定义的阈值,则从目标节点提取(322)数据。 还公开了一种实现该方法的计算机程序产品和系统。