会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer
    • 在分布式存储器并行多节点计算机上高效实现多维快速傅里叶变换
    • US07315877B2
    • 2008-01-01
    • US10468998
    • 2002-02-25
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F17/14
    • H05K7/20836F24F11/77G06F9/52G06F9/526G06F15/17381G06F17/142G09G5/008H04L7/0338
    • The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The “all-to-all” re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
    • 发明内容涉及一种用于有效地实现多维阵列的多维快速傅里叶变换(FFT)的方法,系统和程序存储设备,所述多维阵列包括最初分布在多节点计算机系统中的多个元素,所述多节点包括多个节点 通过网络进行通信,包括:通过所述网络在所述计算机系统的所述多个节点之间以第一维度分布所述阵列的所述多个元素以促进第一一维FFT; 对分布在第一维度中的每个节点的阵列的元素执行第一个一维FFT; 通过网络上的计算机系统的其他节点以随机顺序的“全对全”分布,在第二维度中的每个节点处重新分布一维FFT变换的元素; 以及对在所述第二维度中的每个节点处重新分布的阵列的元素执行第二一维FFT,其中所述随机顺序有助于所述网络的有效利用,从而有效地实现所述多维FFT。 在分布式存储器并行超级计算机上的多维FFT以外的应用中,数组元素的“全部”重新分配进一步有效地实现。
    • 2. 发明授权
    • Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer
    • 在分布式存储并行多节点计算机上高效实现多维快速傅里叶变换
    • US08095585B2
    • 2012-01-10
    • US11931898
    • 2007-10-31
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F17/14
    • H05K7/20836F24F11/77G06F9/52G06F9/526G06F15/17381G06F17/142G09G5/008H04L7/0338
    • The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The “all-to-all” re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
    • 发明内容涉及一种用于有效地实现多维阵列的多维快速傅里叶变换(FFT)的方法,系统和程序存储设备,所述多维阵列包括最初分布在多节点计算机系统中的多个元素,所述多节点包括多个节点 通过网络进行通信,包括:通过所述网络在所述计算机系统的所述多个节点之间以第一维度分布所述阵列的所述多个元素以促进第一一维FFT; 对分布在第一维度中的每个节点的阵列的元素执行第一个一维FFT; 通过网络上的计算机系统的其他节点以随机顺序的“全对全”分布,在第二维度中的每个节点处重新分布一维FFT变换的元素; 以及对在所述第二维度中的每个节点处重新分布的阵列的元素执行第二一维FFT,其中所述随机顺序有助于所述网络的有效利用,从而有效地实现所述多维FFT。 在分布式存储器并行超级计算机上的多维FFT以外的应用中,数组元素的“全部”重新分配进一步有效地实现。
    • 3. 发明申请
    • EFFICIENT IMPLEMENTATION OF MULTIDIMENSIONAL FAST FOURIER TRANSFORM ON A DISTRIBUTED-MEMORY PARALLEL MULTI-NODE COMPUTER
    • 分布式存储器并行多节点计算机的多维快速傅里叶变换的有效实现
    • US20080133633A1
    • 2008-06-05
    • US11931898
    • 2007-10-31
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan V. BhanotDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F17/14G06F9/06
    • H05K7/20836F24F11/77G06F9/52G06F9/526G06F15/17381G06F17/142G09G5/008H04L7/0338
    • The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The “all-to-all” re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
    • 发明内容涉及一种用于有效地实现多维阵列的多维快速傅里叶变换(FFT)的方法,系统和程序存储设备,所述多维阵列包括最初分布在多节点计算机系统中的多个元素,所述多节点包括多个节点 通过网络进行通信,包括:通过所述网络在所述计算机系统的所述多个节点之间以第一维度分布所述阵列的所述多个元素以促进第一一维FFT; 对分布在第一维度中的每个节点的阵列的元素执行第一个一维FFT; 通过网络上的计算机系统的其他节点以随机顺序的“全对全”分布,在第二维度中的每个节点处重新分布一维FFT变换的元素; 以及对在所述第二维度中的每个节点处重新分布的阵列的元素执行第二一维FFT,其中所述随机顺序有助于所述网络的有效利用,从而有效地实现所述多维FFT。 在分布式存储器并行超级计算机上的多维FFT以外的应用中,数组元素的“全部”重新分配进一步有效地实现。
    • 6. 发明授权
    • Low latency memory access and synchronization
    • 低延迟内存访问和同步
    • US07174434B2
    • 2007-02-06
    • US10468994
    • 2002-02-25
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F12/12
    • G06F9/52
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。
    • 7. 发明授权
    • Method for prefetching non-contiguous data structures
    • 预取非连续数据结构的方法
    • US07529895B2
    • 2009-05-05
    • US11617276
    • 2006-12-28
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • Matthias A. BlumrichDong ChenPaul W. CoteusAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk HoenickeMartin OhmachtBurkhard D. Steinmacher-BurowTodd E. TakkenPavlos M. Vranas
    • G06F13/28
    • G06F12/0862G06F9/52G06F2212/6028
    • A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
    • 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单完善。 存储器线被重新定义,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定哪个存储器行被提供而不是一些其它预测 算法。 这使得硬件能够有效地预处理不连续但重复的存储器访问模式。
    • 10. 发明授权
    • Arithmetic functions in torus and tree networks
    • 圆环和树网络中的算术函数
    • US07313582B2
    • 2007-12-25
    • US10468991
    • 2002-02-25
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F7/38
    • G06F15/17337
    • Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer. With the method of this invention, every single element is injected into the network only once and it will be stored and forwarded without any further software overhead. In accordance with a second aspect of the invention, methods and systems are provided to efficiently implement global arithmetic operations on a network that supports the global combining operations. The latency of doing such global operations are greatly reduced by using these methods.
    • 用于执行算术功能的方法和系统。 根据本发明的第一方面,提供了方法和装置,其结合软件算法和类网络路由的硬件实现,以实现对环面上的全局算术运算所需的时间的非常显着的减少。 因此,它可以提高在大型并行机上运行的应用程序的可扩展性。 本发明涉及提高全球运营效率和准确性三个步骤:(1)在必要时确保所有节点按照相同顺序对数据进行全局运算,从而获得独立的回答,而不考虑舍入误差; (2)使用环面的拓扑来最小化跳数和网络的双向能力,将数据传输操作中的时间步数减少到绝对最小值; 和(3)使用类函数路由来减少数据传输中的延迟。 利用本发明的方法,每个单个元件仅被注入到网络中一次,并且它将被存储和转发而没有任何进一步的软件开销。 根据本发明的第二方面,提供了用于在支持全局组合操作的网络上有效地实现全局算术运算的方法和系统。 通过使用这些方法大大减少了进行这种全局操作的延迟。