专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2014176754A1 HISTOGRAM CONSTRUCTION FOR STRING DATA 审中-公开
标题翻译：用于数据的组织结构
公开(公告)号：WO2014176754A1
公开(公告)日：2014-11-06
申请号：PCT/CN2013/075033
申请日：2013-04-30
申请人： HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. , LUO, Ge , JIAO, Li-Mei , CAO, Zhao , CHEN, Shimin , GUO, Meng
发明人： LUO, Ge , JIAO, Li-Mei , CAO, Zhao , CHEN, Shimin , GUO, Meng
IPC分类号： G06F17/21 , G06F9/45
CPC分类号： G06F17/30516 , G06F17/21 , G06F17/30327 , G06F17/30345 , G06F17/3053
摘要： Methods and systems of generation of histograms for strings are described. In one implementation, a prefix tree having nodes representing prefixes of the strings is generated. For the prefix tree, deploy weights are assigned to the nodes based on lengths of the prefixes represented by sub-tree nodes rooted at the nodes and frequencies of the strings whose prefixes are represented by the sub-tree nodes. Each of the deploy weights of one node is indicative of a maximum weight preserved upon filling the buckets with at least one prefix represented by the sub-tree nodes rooted at that one node. A predefined number of Top-prefixes are determined for filling up the predefined number of buckets. The Top-prefixes are determined based on maximizing a total weight preserved by the prefixes in the buckets and over a maximum number of strings. A histogram is generated based on the deploy weights associated with the Top-prefixes.
摘要翻译：描述了生成字符串直方图的方法和系统。在一个实现中，生成具有表示字符串的前缀的节点的前缀树。对于前缀树，根据基于节点的子树节点表示的前缀的长度和由子树节点表示其前缀的字符串的频率，将部署权重分配给节点。一个节点的每个部署权重表示在使用由根节点在该一个节点处的子树节点表示的至少一个前缀来填充桶时保留的最大权重。确定预定义数量的顶部前缀以填充预定义数量的桶。顶部前缀是基于最大化由桶中的前缀保留的总重量以及最大数量的字符串来确定的。基于与顶部前缀相关联的部署权重生成直方图。

2. 发明申请

WO2011063561A1 DATA EXTRACTION METHOD, COMPUTER PROGRAM PRODUCT AND SYSTEM 审中-公开
标题翻译：数据提取方法，计算机程序产品和系统
公开(公告)号：WO2011063561A1
公开(公告)日：2011-06-03
申请号：PCT/CN2009/075117
申请日：2009-11-25
申请人： HEWLETT-PACKARD DEVELOPMENT COMPANY, L. P. , JIAO, Li-Mei , XIONG, Yuhong
发明人： JIAO, Li-Mei , XIONG, Yuhong
IPC分类号： G06F17/30
CPC分类号： G06F17/30896
摘要： Disclosed is a method of automatically extracting data from a target web page, comprising selecting (302) data in a source web page; determining (304) the respective DOM (document object model) trees of the source and target web page, and identifying the one or more nodes comprising the selected data in the source web page DOM tree; determining (306) matching paths in the respective DOM trees; for selected data in a node of an unmatched branch of the source web page DOM tree, identifying (308) the nearest matched path in the source web page; identifying (310) the unmatched branch nearest to the corresponding matched path in the target web page; determining (312) if said identified unmatched branch in the target web page DOM tree comprises a target node matching the selected data node; and if so: extracting (322) data from the target node if the mismatch between the respective unmatched branches does not exceed a predefined threshold. A computer program product and system implementing this method are also disclosed.
摘要翻译：公开了一种从目标网页自动提取数据的方法，包括在源网页中选择（302）数据; 确定（304）源和目标网页的相应DOM（文档对象模型）树，以及在源网页DOM树中标识包括所选数据的一个或多个节点; 确定（306）相应DOM树中的匹配路径; 对于源网页DOM树的不匹配分支的节点中的选定数据，识别（308）源网页中最近的匹配路径; 识别（310）最接近目标网页中相应匹配路径的不匹配分支; 确定（312）如果所述目标网页DOM树中的所述识别的不匹配分支包括与所选数据节点匹配的目标节点; 如果是：如果各个不匹配的分支之间的不匹配没有超过预定义的阈值，则从目标节点提取（322）数据。还公开了一种实现该方法的计算机程序产品和系统。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式