会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明申请
    • PARALLEL DEBUGGING IN A MASSIVELY PARALLEL COMPUTING SYSTEM
    • 并行调试在大规模并行计算系统中
    • US20110191633A1
    • 2011-08-04
    • US12697721
    • 2010-02-01
    • Charles Jens ArcherTodd Alan Inglett
    • Charles Jens ArcherTodd Alan Inglett
    • G06F11/263
    • G06F11/263
    • A method and apparatus is described for parallel debugging on the data nodes of a parallel computer system. A data template associated with the debugger can be used as a reference to the common data on the nodes. The application or data contained on the compute nodes diverges from the data template at the service node during the course of program execution, so that pieces of the data are different at each of the nodes at some time of interest. For debugging, the compute nodes search their own memory image for checksum matches with the template and produces new data blocks with checksums that didn't exist in the data template, and a template of references to the original data blocks in the template. Examples herein include an application of the rsync protocol, compression and network broadcast to improve debugging in a massively parallel computer environment.
    • 描述了用于并行计算机系统的数据节点上的并行调试的方法和装置。 与调试器相关联的数据模板可以用作对节点上的公共数据的引用。 包含在计算节点上的应用程序或数据在程序执行过程中从服务节点处的数据模板发散,使得在某些感兴趣的时间点,每个节点上的数据片段不同。 为了进行调试,计算节点搜索自己的内存映像以与模板进行校验和匹配,并生成新的数据块,其中包含数据模板中不存在校验和的新数据块,以及模板中原始数据块的引用模板。 本文的示例包括rsync协议,压缩和网络广播的应用,以改进大规模并行计算机环境中的调试。
    • 8. 发明授权
    • Template based parallel checkpointing in a massively parallel computer system
    • 在大规模并行计算机系统中基于模板的并行检查点
    • US07627783B2
    • 2009-12-01
    • US12104224
    • 2008-04-16
    • Charles Jens ArcherTodd Alan Inglett
    • Charles Jens ArcherTodd Alan Inglett
    • G06F11/00
    • G06F11/1438G06F11/1451
    • A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
    • 一种用于基于模板的并行检查点的方法和装置,用于使用rsync协议的并行变体和网络广播的大规模并行超级计算机系统。 在优选实施例中,将每个节点的检查点数据与驻留在存储器中并且之前产生的模板检查点文件进行比较。 本文的实施方式大大减少了必须发送和存储的数据量,以便更快地检查点和提高计算机系统的效率。 实施例涉及具有布置在具有可执行广播通信的高速互连的集群中的节点的并行计算机系统。 检查点包含一系列具有系统中所有节点的相应校验和的实际小数据块。 可以使用常规的非有损数据压缩算法来压缩数据块,以进一步减少总体检查点大小。