专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

101. 发明申请

US20090007141A1 MESSAGE PASSING WITH A LIMITED NUMBER OF DMA BYTE COUNTERS 失效
标题翻译：消息传递与有限数量的DMA字节计数器
公开(公告)号：US20090007141A1
公开(公告)日：2009-01-01
申请号：US11768813
申请日：2007-06-26
申请人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker
发明人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker
IPC分类号： G06F9/44
CPC分类号： G06F15/17356 , G06F9/546
摘要： A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.
摘要翻译：一种在并行计算机系统中传送消息的方法，该并行计算机系统被构造为作为网络互连的多个计算节点，其中每个计算节点包括DMA引擎，但是仅包括有限数量的字节计数器，用于跟踪由 DMA引擎，其中可以在共享计数器或专用计数器操作模式中使用字节计数器。该方法包括使用会合协议，源计算节点使用专用注入计数器确定性地发送具有单个RTS描述符的请求（RTS）消息以跟踪要与RTS消息相关联地发送的RTS消息和消息数据，到目的地计算节点，使得RTS描述符向目标计算节点指示消息数据将自适应地路由到目的地节点。在源计算节点使用一个DMA FIFO，将为发往目的地计算节点的会合消息保留RTS描述符，以确保正确的消息数据顺序。在DMA引擎上使用接收计数器，目的地计算节点跟踪RTS和相关联的消息数据的接收，并以远程获取的会合协议形式向源节点发送明确发送（CTS）消息以接受RTS消息和消息数据，并由源计算节点DMA引擎处理远程获取（CTS）以提供要发送的消息数据。

102. 发明申请

US20090006770A1 NOVEL SNOOP FILTER FOR FILTERING SNOOP REQUESTS 失效
标题翻译：用于过滤SNOOP要求的新SNOOP过滤器
公开(公告)号：US20090006770A1
公开(公告)日：2009-01-01
申请号：US12113262
申请日：2008-05-01
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/08
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

103. 发明申请

US20090006605A1 EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG 失效
标题翻译：使用写持续提示标签扩展写入组合
公开(公告)号：US20090006605A1
公开(公告)日：2009-01-01
申请号：US11768593
申请日：2007-06-26
申请人： Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmacht , Pavlos Vranas
发明人： Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmacht , Pavlos Vranas
IPC分类号： G06F17/30 , G06F15/173
CPC分类号： H04L49/9021 , G06F12/0862 , H04L49/90
摘要： A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.
摘要翻译：一种用于减少网络计算系统中的处理量的计算装置，其包括用于接收包括数据的电子消息的接收节点的网络系统设备。从发送节点发送电子消息。网络系统设备确定何时正在发送特定电子消息的更多数据。存储装置存储电子消息数据并与网络系统装置进行通信。存储器子系统与存储器件通信。当更多的特定消息的数据将被接收时，存储器子系统存储电子消息的一部分，并且缓冲器将该部分与稍后接收的数据组合，并将数据移动到存储器装置以进行存取。

104. 发明申请

US20080313408A1 LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
标题翻译：低延迟存储器访问和同步
公开(公告)号：US20080313408A1
公开(公告)日：2008-12-18
申请号：US12196796
申请日：2008-08-22
申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas
IPC分类号： G06F12/08
CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028
摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
摘要翻译：与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的Bach处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

105. 发明授权

US07457303B2 One-bounce network 失效
标题翻译：单反网络
公开(公告)号：US07457303B2
公开(公告)日：2008-11-25
申请号：US10675129
申请日：2003-09-30
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： H04L12/28
CPC分类号： H04L45/06
摘要： A one-bounce data network comprises a plurality of nodes interconnected to each other via communication links, the network including a plurality of interconnected switch devices, said switch devices interconnected such that a message is communicated between any two switches passes over a single link from a source switch to a destination switch; and, the source switch concurrently sends a message to an arbitrary bounce switch which then sends the message to the destination switch.
摘要翻译：一弹跳数据网络包括经由通信链路相互互连的多个节点，所述网络包括多个互连的交换设备，所述交换设备互连，使得在任何两个交换机之间传送的消息通过单个链路从源切换到目的地交换机; 并且，源交换机同时向任意的反弹交换机发送消息，然后将消息发送到目的地交换机。

106. 发明申请

US20080222364A1 SNOOP FILTERING SYSTEM IN A MULTIPROCESSOR SYSTEM 有权
标题翻译： SNOOP过滤系统在多处理器系统中的应用
公开(公告)号：US20080222364A1
公开(公告)日：2008-09-11
申请号：US12126674
申请日：2008-05-23
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/08
CPC分类号： G06F12/0831 , G06F12/0813 , Y02D10/13
摘要： A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
摘要翻译：一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法，每个单元具有与其可操作耦合的相关联的高速缓存存储器系统该系统包括多个互连的窥探过滤器单元，每个窥探过滤器单元对应于相应处理单元并与其通信，每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备，用于处理在其上接收的窥探请求，并且转发请求之一或者阻止将请求转发到其相关联的处理单元。多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件，每个并行操作子滤波器元件同时接收相同的窥探请求，并且实现一个或多个不同的窥探滤波器算法，用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求并且防止将这些请求转发到处理器单元。以这种方式，减少了转发到处理单元的多个窥探请求，从而增加了计算环境的性能。

107. 发明申请

US20080133633A1 EFFICIENT IMPLEMENTATION OF MULTIDIMENSIONAL FAST FOURIER TRANSFORM ON A DISTRIBUTED-MEMORY PARALLEL MULTI-NODE COMPUTER 失效
标题翻译：分布式存储器并行多节点计算机的多维快速傅里叶变换的有效实现
公开(公告)号：US20080133633A1
公开(公告)日：2008-06-05
申请号：US11931898
申请日：2007-10-31
申请人： Gyan V. Bhanot , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Gyan V. Bhanot , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F17/14 , G06F9/06
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via “all-to-all” distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The “all-to-all” re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
摘要翻译：发明内容涉及一种用于有效地实现多维阵列的多维快速傅里叶变换（FFT）的方法，系统和程序存储设备，所述多维阵列包括最初分布在多节点计算机系统中的多个元素，所述多节点包括多个节点通过网络进行通信，包括：通过所述网络在所述计算机系统的所述多个节点之间以第一维度分布所述阵列的所述多个元素以促进第一一维FFT; 对分布在第一维度中的每个节点的阵列的元素执行第一个一维FFT; 通过网络上的计算机系统的其他节点以随机顺序的“全对全”分布，在第二维度中的每个节点处重新分布一维FFT变换的元素; 以及对在所述第二维度中的每个节点处重新分布的阵列的元素执行第二一维FFT，其中所述随机顺序有助于所述网络的有效利用，从而有效地实现所述多维FFT。在分布式存储器并行超级计算机上的多维FFT以外的应用中，数组元素的“全部”重新分配进一步有效地实现。

108. 发明申请

US20080104367A1 Collective Network For Computer Structures 有权
标题翻译：计算机结构集体网
公开(公告)号：US20080104367A1
公开(公告)日：2008-05-01
申请号：US11572372
申请日：2005-07-18
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F15/80 , G06F9/30
CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

109. 发明申请

US20060224838A1 Novel snoop filter for filtering snoop requests 有权
标题翻译：用于过滤窥探请求的新型窥探过滤器
公开(公告)号：US20060224838A1
公开(公告)日：2006-10-05
申请号：US11093152
申请日：2005-03-29
申请人： Matthias Blumrich , Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
发明人： Matthias Blumrich , Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
IPC分类号： G06F13/28
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与其可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

110. 发明申请

US20060224835A1 Snoop filtering system in a multiprocessor system 有权
标题翻译：多处理器系统中的Snoop过滤系统
公开(公告)号：US20060224835A1
公开(公告)日：2006-10-05
申请号：US11093127
申请日：2005-03-29
申请人： Matthias Blumrich , Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
发明人： Matthias Blumrich , Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
IPC分类号： G06F13/28
CPC分类号： G06F12/0831 , G06F12/0813 , Y02D10/13
摘要： A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
摘要翻译：一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法，每个单元具有与其可操作耦合的相关联的高速缓存存储器系统该系统包括多个互连的窥探过滤器单元，每个窥探过滤器单元对应于相应处理单元并与其通信，每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备，用于处理在其上接收的窥探请求，并且转发请求之一或者阻止将请求转发到其相关联的处理单元。多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件，每个并行操作子滤波器元件同时接收相同的窥探请求，并且实现一个或多个不同的窥探滤波器算法，用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求并且防止将这些请求转发到处理器单元。以这种方式，减少了转发到处理单元的多个窥探请求，从而增加了计算环境的性能。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式