基本信息:
- 专利标题: DISTRIBUTED DATA SET STORAGE AND RETRIEVAL
- 专利标题(中):分布式数据存储和检索
- 申请号:PCT/US2016/044309 申请日:2016-07-27
- 公开(公告)号:WO2017019794A1 公开(公告)日:2017-02-02
- 发明人: BOWMAN, Brian Payton , KRUEGER, Steven E. , KNIGHT, Richard Todd , HO, Chih-Wei
- 申请人: SAS INSTITUTE INC.
- 申请人地址: SAS Campus Drive Cary, North Carolina 27513 US
- 专利权人: SAS INSTITUTE INC.
- 当前专利权人: SAS INSTITUTE INC.
- 当前专利权人地址: SAS Campus Drive Cary, North Carolina 27513 US
- 代理机构: KACVINSKY, John F.
- 优先权: US62/197,514 20150727; US62/197,519 20150727; US15/220,034 20160726; US15/220,182 20160726; US15/220,192 20160726
- 主分类号: G06F12/00
- IPC分类号: G06F12/00 ; G06F17/30 ; G06N5/02 ; G06N5/04
摘要:
An apparatus includes a processor component caused to: retrieve metadata of organization of data within a data set, and map data of organization of data blocks within a data file; receive indications of which node devices are available to perform a processing task with a data set portion; and in response to the data set including partitioned data, compare the quantities of available node devices and of the node devices last involved in storing the data set. In response to a match, for cacti map data map entry: retrieve a hashed identifier for a data sub-block, and a size for each of the data sub-blocks within the corresponding data block; divide the hashed identifier by the quantity of available node devices; compare the modulo value to a designation assigned to each of the available node devices; and provide a pointer to the available node device assigned the matching designation.
摘要(中):
一种装置包括:处理器组件,用于:检索数据集内的数据组织的元数据,以及映射数据文件内的数据块的组织数据; 接收哪些节点设备可用于使用数据组部分执行处理任务的指示; 并且响应于包括分割数据的数据集,比较存储数据集的最后涉及的可用节点设备和节点设备的数量。 响应于匹配,对于仙人掌映射数据映射条目:检索用于数据子块的散列标识符以及相应数据块内的每个数据子块的大小; 将哈希标识符除以可用节点设备的数量; 将模值与分配给每个可用节点设备的指定进行比较; 并提供指向分配了匹配名称的可用节点设备的指针。