会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Arithmetic functions in torus and tree networks
    • 圆环和树网络中的算术函数
    • US07313582B2
    • 2007-12-25
    • US10468991
    • 2002-02-25
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F7/38
    • G06F15/17337
    • Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer. With the method of this invention, every single element is injected into the network only once and it will be stored and forwarded without any further software overhead. In accordance with a second aspect of the invention, methods and systems are provided to efficiently implement global arithmetic operations on a network that supports the global combining operations. The latency of doing such global operations are greatly reduced by using these methods.
    • 用于执行算术功能的方法和系统。 根据本发明的第一方面,提供了方法和装置,其结合软件算法和类网络路由的硬件实现,以实现对环面上的全局算术运算所需的时间的非常显着的减少。 因此,它可以提高在大型并行机上运行的应用程序的可扩展性。 本发明涉及提高全球运营效率和准确性三个步骤:(1)在必要时确保所有节点按照相同顺序对数据进行全局运算,从而获得独立的回答,而不考虑舍入误差; (2)使用环面的拓扑来最小化跳数和网络的双向能力,将数据传输操作中的时间步数减少到绝对最小值; 和(3)使用类函数路由来减少数据传输中的延迟。 利用本发明的方法,每个单个元件仅被注入到网络中一次,并且它将被存储和转发而没有任何进一步的软件开销。 根据本发明的第二方面,提供了用于在支持全局组合操作的网络上有效地实现全局算术运算的方法和系统。 通过使用这些方法大大减少了进行这种全局操作的延迟。
    • 4. 发明申请
    • Optimizing layout of an application on a massively parallel supercomputer
    • 在大型并行超级计算机上优化应用程序的布局
    • US20060101104A1
    • 2006-05-11
    • US10963101
    • 2004-10-12
    • Gyan BhanotAlan GaraPhilip HeidelbergerEoin LawlessJames SextonRobert Walkup
    • Gyan BhanotAlan GaraPhilip HeidelbergerEoin LawlessJames SextonRobert Walkup
    • G06F1/16
    • G06F9/5066
    • A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
    • 描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。 该方法采用数组形式的任意问题的通信矩阵作为输入,其条目C(i,j)是从域i到域j传送的数据量。 给定C(i,j),首先实现启发式映射,其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点,同时保持域的数量映射到 超级计算机节点常数(尽可能多)。 接下来,使用具有自由能量(成本函数)的蒙特卡罗模拟,从初始映射生成马尔可夫链映射,其中F =Σi,j C(i,j)H(i,j) H(i,j)是域i和域j之间的超级计算机环面上的最小跳数。 在测试的情况下,发现该方法产生良好的映射,并且有可能被用作并行代码的通用布局优化工具。 此时,实现测试方法的序列号未优化,以便在典型的PC上找到最佳映射的计算时间可以为几个小时。 对于生产实现,将需要我们的算法的良好的并行代码,这本身可以在超级计算机上实现。