专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

21. 发明申请

US20090259713A1 NOVEL MASSIVELY PARALLEL SUPERCOMPUTER 有权
标题翻译：新的大型并行超级计算机
公开(公告)号：US20090259713A1
公开(公告)日：2009-10-15
申请号：US12492799
申请日：2009-06-26
申请人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken
发明人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken
IPC分类号： G06F15/76 , G06F15/16 , G06F11/28 , G06F12/08 , G06F9/02 , G06F15/177
CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338
摘要： A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external connectivity and used for Input/Output, System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the supercomputer in multiple networks for optimizing supercomputing resources.
摘要翻译：数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构，即每个处理节点包括单个专用集成电路（ASIC）。在每个ASIC节点内是多个处理元件，每个处理元件由中央处理单元（CPU）和多个浮点处理器组成，以实现计算性能，封装密度，低成本以及功率和冷却要求的最佳平衡。单个节点内的多个处理器可以单独使用或同时使用，以在任何时间点解决或执行的特定算法所要求的任何计算或通信组合上工作。片上系统ASIC节点通过多个独立网络互连，从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。在优选实施例中，多个网络包括用于并行算法消息传递的三个高速网络，包括提供全局障碍和通知功能的环形，全局树和全球异步网络。这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。对于特定类别的并行算法或并行计算的部分，该架构具有出色的计算性能，并且可以启用对新类并行算法执行计算。为外部连接提供附加网络，用于输入/输出，系统管理和配置以及调试和监控功能。实现中平面和其他硬件设备的特殊节点打包技术有助于在多个网络中划分超级计算机，以优化超级计算资源。

22. 发明授权

US07380071B2 Snoop filtering system in a multiprocessor system 有权
标题翻译：多处理器系统中的Snoop过滤系统
公开(公告)号：US07380071B2
公开(公告)日：2008-05-27
申请号：US11093127
申请日：2005-03-29
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F13/28 , G06F12/00
CPC分类号： G06F12/0831 , G06F12/0813 , Y02D10/13
摘要： A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
摘要翻译：一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法，每个单元具有与其可操作地耦合的相关联的高速缓冲存储器系统。该系统包括多个互连的窥探过滤器单元，每个窥探过滤器单元对应于相应处理单元并与其通信，每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备，用于处理在其上接收的窥探请求，并且转发请求之一或者阻止将请求转发到其相关联的处理单元。多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件，每个并行操作子滤波器元件同时接收相同的窥探请求，并且实现一个或多个不同的窥探滤波器算法，用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求并且防止将这些请求转发到处理器单元。以这种方式，减少了转发到处理单元的多个窥探请求，从而增加了计算环境的性能。

23. 发明授权

US08626957B2 Collective network for computer structures 有权
标题翻译：计算机结构集体网络
公开(公告)号：US08626957B2
公开(公告)日：2014-01-07
申请号：US13101566
申请日：2011-05-05
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F15/16
CPC分类号： H04L1/08 , G06F9/46 , G06F11/08 , G06F11/1423 , H03M13/09 , H04L1/0061 , H04L1/1607 , H04L1/1867 , H04L2001/0093 , H04L2001/0097
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。包括通过链路互连网络节点的路由器设备，以便于在虚拟网络的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

24. 发明授权

US08001280B2 Collective network for computer structures 有权
标题翻译：计算机结构集体网络
公开(公告)号：US08001280B2
公开(公告)日：2011-08-16
申请号：US11572372
申请日：2005-07-18
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F15/16
CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

25. 发明申请

US20110219280A1 COLLECTIVE NETWORK FOR COMPUTER STRUCTURES 有权
标题翻译：电脑结构的集体网络
公开(公告)号：US20110219280A1
公开(公告)日：2011-09-08
申请号：US13101566
申请日：2011-05-05
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： H03M13/09 , H04L1/08 , G06F11/10 , G06F11/14
CPC分类号： H04L1/08 , G06F9/46 , G06F11/08 , G06F11/1423 , H03M13/09 , H04L1/0061 , H04L1/1607 , H04L1/1867 , H04L2001/0093 , H04L2001/0097
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。包括通过链路互连网络节点的路由器设备，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

26. 发明授权

US08255638B2 Snoop filter for filtering snoop requests 失效
标题翻译：用于过滤窥探请求的Snoop过滤器
公开(公告)号：US08255638B2
公开(公告)日：2012-08-28
申请号：US12113262
申请日：2008-05-01
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/00 , G06F13/00
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

27. 发明申请

US20090006672A1 METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP 失效
标题翻译：有效跟踪与TIMESTAMP相关的队列的方法和设备
公开(公告)号：US20090006672A1
公开(公告)日：2009-01-01
申请号：US11768800
申请日：2007-06-26
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
IPC分类号： G06F3/00 , G06F1/04
CPC分类号： G06F12/0835 , G06F12/0831
摘要： An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.
摘要翻译：一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。该装置包括相干逻辑单元，每个单元具有多个队列结构，每个队列结构与在系统中传输的事件信号的相应发送者相关联。与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队，并且计数器跟踪队列结构中剩余入队的多个相干事件信号，并且从接收到时间戳信号起出队。计数器机构产生一个输出信号，指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。在一个实施例中，时间戳信号在存储器同步操作的开始被断言，并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。然后可以将该信号用作存储器同步操作的完成条件的一部分。

28. 发明申请

US20080104367A1 Collective Network For Computer Structures 有权
标题翻译：计算机结构集体网
公开(公告)号：US20080104367A1
公开(公告)日：2008-05-01
申请号：US11572372
申请日：2005-07-18
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F15/80 , G06F9/30
CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

29. 发明授权

US08756350B2 Method and apparatus for efficiently tracking queue entries relative to a timestamp 失效
标题翻译：相对于时间戳有效跟踪队列条目的方法和装置
公开(公告)号：US08756350B2
公开(公告)日：2014-06-17
申请号：US11768800
申请日：2007-06-26
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas
IPC分类号： G06F3/00 , G06F5/00
CPC分类号： G06F12/0835 , G06F12/0831
摘要： An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.
摘要翻译：一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。该装置包括相干逻辑单元，每个单元具有多个队列结构，每个队列结构与在系统中传输的事件信号的相应发送者相关联。与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队，并且计数器跟踪队列结构中剩余入队的多个相干事件信号，并且从接收到时间戳信号起出队。计数器机构产生一个输出信号，指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。在一个实施例中，时间戳信号在存储器同步操作的开始被断言，并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。然后可以将该信号用作存储器同步操作的完成条件的一部分。

30. 发明授权

US07603523B2 Method and apparatus for filtering snoop requests in a point-to-point interconnect architecture 失效
标题翻译：用于在点对点互连架构中过滤窥探请求的方法和装置
公开(公告)号：US07603523B2
公开(公告)日：2009-10-13
申请号：US12035085
申请日：2008-02-21
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/08 , G06F13/00
CPC分类号： G06F12/0831 , G06F12/084 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each of the memory writing sources is directly connected to the dedicated input ports of all other snoop filter devices associated with all other processing units in a point-to-point interconnect fashion. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个存储器写入源以点对点互连方式直接连接到与所有其他处理单元相关联的所有其他窥探滤波器设备的专用输入端口。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，该多个专用输入端口适于同时滤除从相应专用存储器写入源接收到的窥探请求，并将这些请求的子集转发到其相关联的处理单元。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式