会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • HISTOGRAM CONSTRUCTION FOR STRING DATA
    • 用于数据的组织结构
    • WO2014176754A1
    • 2014-11-06
    • PCT/CN2013/075033
    • 2013-04-30
    • HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.LUO, GeJIAO, Li-MeiCAO, ZhaoCHEN, ShiminGUO, Meng
    • LUO, GeJIAO, Li-MeiCAO, ZhaoCHEN, ShiminGUO, Meng
    • G06F17/21G06F9/45
    • G06F17/30516G06F17/21G06F17/30327G06F17/30345G06F17/3053
    • Methods and systems of generation of histograms for strings are described. In one implementation, a prefix tree having nodes representing prefixes of the strings is generated. For the prefix tree, deploy weights are assigned to the nodes based on lengths of the prefixes represented by sub-tree nodes rooted at the nodes and frequencies of the strings whose prefixes are represented by the sub-tree nodes. Each of the deploy weights of one node is indicative of a maximum weight preserved upon filling the buckets with at least one prefix represented by the sub-tree nodes rooted at that one node. A predefined number of Top-prefixes are determined for filling up the predefined number of buckets. The Top-prefixes are determined based on maximizing a total weight preserved by the prefixes in the buckets and over a maximum number of strings. A histogram is generated based on the deploy weights associated with the Top-prefixes.
    • 描述了生成字符串直方图的方法和系统。 在一个实现中,生成具有表示字符串的前缀的节点的前缀树。 对于前缀树,根据基于节点的子树节点表示的前缀的长度和由子树节点表示其前缀的字符串的频率,将部署权重分配给节点。 一个节点的每个部署权重表示在使用由根节点在该一个节点处的子树节点表示的至少一个前缀来填充桶时保留的最大权重。 确定预定义数量的顶部前缀以填充预定义数量的桶。 顶部前缀是基于最大化由桶中的前缀保留的总重量以及最大数量的字符串来确定的。 基于与顶部前缀相关联的部署权重生成直方图。
    • 2. 发明申请
    • DATA EXTRACTION METHOD, COMPUTER PROGRAM PRODUCT AND SYSTEM
    • 数据提取方法,计算机程序产品和系统
    • WO2011063561A1
    • 2011-06-03
    • PCT/CN2009/075117
    • 2009-11-25
    • HEWLETT-PACKARD DEVELOPMENT COMPANY, L. P.JIAO, Li-MeiXIONG, Yuhong
    • JIAO, Li-MeiXIONG, Yuhong
    • G06F17/30
    • G06F17/30896
    • Disclosed is a method of automatically extracting data from a target web page, comprising selecting (302) data in a source web page; determining (304) the respective DOM (document object model) trees of the source and target web page, and identifying the one or more nodes comprising the selected data in the source web page DOM tree; determining (306) matching paths in the respective DOM trees; for selected data in a node of an unmatched branch of the source web page DOM tree, identifying (308) the nearest matched path in the source web page; identifying (310) the unmatched branch nearest to the corresponding matched path in the target web page; determining (312) if said identified unmatched branch in the target web page DOM tree comprises a target node matching the selected data node; and if so: extracting (322) data from the target node if the mismatch between the respective unmatched branches does not exceed a predefined threshold. A computer program product and system implementing this method are also disclosed.
    • 公开了一种从目标网页自动提取数据的方法,包括在源网页中选择(302)数据; 确定(304)源和目标网页的相应DOM(文档对象模型)树,以及在源网页DOM树中标识包括所选数据的一个或多个节点; 确定(306)相应DOM树中的匹配路径; 对于源网页DOM树的不匹配分支的节点中的选定数据,识别(308)源网页中最近的匹配路径; 识别(310)最接近目标网页中相应匹配路径的不匹配分支; 确定(312)如果所述目标网页DOM树中的所述识别的不匹配分支包括与所选数据节点匹配的目标节点; 如果是:如果各个不匹配的分支之间的不匹配没有超过预定义的阈值,则从目标节点提取(322)数据。 还公开了一种实现该方法的计算机程序产品和系统。