专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

91. 发明授权

US08677073B2 Snoop filter for filtering snoop requests 有权
标题翻译：用于过滤窥探请求的Snoop过滤器
公开(公告)号：US08677073B2
公开(公告)日：2014-03-18
申请号：US13587420
申请日：2012-08-16
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F13/28 , G06F12/00
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

92. 发明申请

US20090006770A1 NOVEL SNOOP FILTER FOR FILTERING SNOOP REQUESTS 失效
标题翻译：用于过滤SNOOP要求的新SNOOP过滤器
公开(公告)号：US20090006770A1
公开(公告)日：2009-01-01
申请号：US12113262
申请日：2008-05-01
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/08
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

93. 发明申请

US20080222364A1 SNOOP FILTERING SYSTEM IN A MULTIPROCESSOR SYSTEM 有权
标题翻译： SNOOP过滤系统在多处理器系统中的应用
公开(公告)号：US20080222364A1
公开(公告)日：2008-09-11
申请号：US12126674
申请日：2008-05-23
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/08
CPC分类号： G06F12/0831 , G06F12/0813 , Y02D10/13
摘要： A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
摘要翻译：一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法，每个单元具有与其可操作耦合的相关联的高速缓存存储器系统该系统包括多个互连的窥探过滤器单元，每个窥探过滤器单元对应于相应处理单元并与其通信，每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备，用于处理在其上接收的窥探请求，并且转发请求之一或者阻止将请求转发到其相关联的处理单元。多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件，每个并行操作子滤波器元件同时接收相同的窥探请求，并且实现一个或多个不同的窥探滤波器算法，用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求并且防止将这些请求转发到处理器单元。以这种方式，减少了转发到处理单元的多个窥探请求，从而增加了计算环境的性能。

94. 发明授权

US08117288B2 Optimizing layout of an application on a massively parallel supercomputer 失效
标题翻译：在大型并行超级计算机上优化应用程序的布局
公开(公告)号：US08117288B2
公开(公告)日：2012-02-14
申请号：US10963101
申请日：2004-10-12
申请人： Gyan V. Bhanot , Alan Gara , Philip Heidelberger , Eoin M. Lawless , James C. Sexton , Robert E. Walkup
发明人： Gyan V. Bhanot , Alan Gara , Philip Heidelberger , Eoin M. Lawless , James C. Sexton , Robert E. Walkup
IPC分类号： G06F15/177
CPC分类号： G06F9/5066
摘要： A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)− where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
摘要翻译：描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。该方法采用数组形式的任意问题的通信矩阵作为输入，其条目C（i，j）是从域i到域j传送的数据量。给定C（i，j），首先实现启发式映射，其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点，同时保持域的数量映射到超级计算机节点常数（尽可能多）。接下来，使用具有自由能的蒙特卡罗模拟（成本函数）F =＆Sgr; i，jC（i，j）H（i，j），从初始映射生成马尔科夫链映射。其中H（i，j）域i和域j之间的超级计算机环面上的最小跳数。在测试的情况下，发现该方法产生良好的映射，并且有可能被用作并行代码的通用布局优化工具。此时，实现测试方法的序列号未优化，以便在典型的PC上找到最佳映射的计算时间可以为几个小时。对于生产实现，将需要我们的算法的良好的并行代码，这本身可以在超级计算机上实现。

95. 发明申请

US20060101104A1 Optimizing layout of an application on a massively parallel supercomputer 失效
标题翻译：在大型并行超级计算机上优化应用程序的布局
公开(公告)号：US20060101104A1
公开(公告)日：2006-05-11
申请号：US10963101
申请日：2004-10-12
申请人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup
发明人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup
IPC分类号： G06F1/16
CPC分类号： G06F9/5066
摘要： A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
摘要翻译：描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。该方法采用数组形式的任意问题的通信矩阵作为输入，其条目C（i，j）是从域i到域j传送的数据量。给定C（i，j），首先实现启发式映射，其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点，同时保持域的数量映射到超级计算机节点常数（尽可能多）。接下来，使用具有自由能量（成本函数）的蒙特卡罗模拟，从初始映射生成马尔可夫链映射，其中F =Σi，j C（i，j）H（i，j） H（i，j）是域i和域j之间的超级计算机环面上的最小跳数。在测试的情况下，发现该方法产生良好的映射，并且有可能被用作并行代码的通用布局优化工具。此时，实现测试方法的序列号未优化，以便在典型的PC上找到最佳映射的计算时间可以为几个小时。对于生产实现，将需要我们的算法的良好的并行代码，这本身可以在超级计算机上实现。

96. 发明授权

US09137098B2 T-Star interconnection network topology 有权
标题翻译： T星互连网络拓扑
公开(公告)号：US09137098B2
公开(公告)日：2015-09-15
申请号：US13584300
申请日：2012-08-13
申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
IPC分类号： H04L12/715 , H04L12/24
CPC分类号： H04L41/0663 , H04L41/12 , H04L41/145 , H04L45/04
摘要： According to one embodiment of the present invention, a method of constructing network communication for a grid of node groups is provided, the grid including an M dimensional grid, each node group including N nodes, wherein M is greater than or equal to one and N is greater than one, wherein each node includes a router. The method includes directly connecting each node in each node group to every other node in the node group via intra-group links and directly connecting each node in each node group of the M dimensional grid to a node in each neighboring node group in the M dimensional grid via inter-group links.
摘要翻译：根据本发明的一个实施例，提供了一种为节点组网格构建网络通信的方法，所述网格包括M维网格，每个节点组包括N个节点，其中M大于或等于1，并且N 大于1，其中每个节点包括路由器。该方法包括通过组内链路将每个节点组中的每个节点直接连接到节点组中的每个其他节点，并将M维网格的每个节点组中的每个节点直接连接到M维中的每个相邻节点组中的节点网格通过组间链接。

97. 发明申请

US20140044015A1 T-STAR INTERCONNECTION NETWORK TOPOLOGY 审中-公开
标题翻译： T-STAR互联网络拓扑
公开(公告)号：US20140044015A1
公开(公告)日：2014-02-13
申请号：US13584300
申请日：2012-08-13
申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
IPC分类号： H04L12/28
CPC分类号： H04L41/0663 , H04L41/12 , H04L41/145 , H04L45/04
摘要： According to one embodiment of the present invention, a method of constructing network communication for a grid of node groups is provided, the grid including an M dimensional grid, each node group including N nodes, wherein M is greater than or equal to one and N is greater than one, wherein each node includes a router. The method includes directly connecting each node in each node group to every other node in the node group via intra-group links and directly connecting each node in each node group of the M dimensional grid to a node in each neighboring node group in the M dimensional grid via inter-group links.
摘要翻译：根据本发明的一个实施例，提供了一种为节点组网格构建网络通信的方法，所述网格包括M维网格，每个节点组包括N个节点，其中M大于或等于1，并且N 大于1，其中每个节点包括路由器。该方法包括通过组内链路将每个节点组中的每个节点直接连接到节点组中的每个其他节点，并将M维网格的每个节点组中的每个节点直接连接到M维中的每个相邻节点组中的节点网格通过组间链接。

98. 发明授权

US08112559B2 Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment 有权
标题翻译：增加可用的FIFO空间，以防止DMA环境中的消息队列死锁
公开(公告)号：US08112559B2
公开(公告)日：2012-02-07
申请号：US12241634
申请日：2008-09-30
申请人： Michael A. Blocksome , Dong Chen , Thomas Gooding , Philip Heidelberger , Jeff Parker
发明人： Michael A. Blocksome , Dong Chen , Thomas Gooding , Philip Heidelberger , Jeff Parker
IPC分类号： G06F13/28 , G06F15/167
CPC分类号： G06F13/28
摘要： Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
摘要翻译：本发明的实施例可以用于在并行计算环境中管理消息队列以防止消息队列死锁。计算节点的直接存储器访问控制器可以确定消息队列何时已满。作为响应，DMA可能会产生中断。中断处理程序可能会停止DMA，并将所有描述符从完整消息队列交换到更大的队列（或放大原始队列）。然后中断处理程序重新启动DMA。或者，中断处理程序停止DMA，分配存储块来保存队列数据，然后将描述符从完整消息队列移动到分配的内存块中。然后中断处理程序重新启动DMA。在正常消息传递提前周期期间，消息收发管理器尝试将描述符注入到其他消息队列中，直到描述符全部被处理。

99. 发明申请

US20110173349A1 I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK 有权
标题翻译：多维多功能网络中的I / O路由
公开(公告)号：US20110173349A1
公开(公告)日：2011-07-14
申请号：US12697175
申请日：2010-01-29
申请人： Dong Chen , Noel A. Eisley , Philip Heidelberger
发明人： Dong Chen , Noel A. Eisley , Philip Heidelberger
IPC分类号： G06F3/00
CPC分类号： H04L45/00 , G06F15/17387 , H04L45/06
摘要： A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
摘要翻译：公开了一种方法，系统和计算机程序产品，用于在包括多个计算节点的多维环面计算节点网络的计算系统中路由数据分组，以及包括多个I / O节点的I / O节点网络。在一个实施例中，该方法包括向每个数据分组分配标识计算节点之一的目的地地址; 为每个数据分组提供一个toio值; 将数据包通过计算节点网络路由到数据包的目的地址; 并且当每个数据分组到达分配给所述每个数据分组的目的地地址时，如果所述每个数据分组的toio值是指定值，则将所述每个数据分组路由到所述I / O节点之一。在一个实施例中，每个数据分组还具有用于通过计算节点网络路由数据分组的ioreturn值。

100. 发明申请

US20110173343A1 ZONE ROUTING IN A TORUS NETWORK 失效
标题翻译：多功能网络中的区域路由
公开(公告)号：US20110173343A1
公开(公告)日：2011-07-14
申请号：US12684184
申请日：2010-01-08
申请人： Dong Chen , Philip Heidelberger , Sameer Kumar
发明人： Dong Chen , Philip Heidelberger , Sameer Kumar
IPC分类号： G06F15/173
CPC分类号： G06F15/17381
摘要： A system for routing data in a network comprising a network logic device at a sending node for determining a path between the sending node and a receiving node, wherein the network logic device sets one or more selection bits and one or more hint bits within the data packet, a control register for storing one or more masks, wherein the network logic device uses the one or more selection bits to select a mask from the control register and the network logic device applies the selected mask to the hint bits to restrict routing of the data packet to one or more routing directions for the data packet within the network and selects one of the restricted routing directions from the one or more routing directions and sends the data packet along a link in the selected routing direction toward the receiving node.
摘要翻译：一种用于在网络中路由数据的系统，包括在发送节点处的网络逻辑设备，用于确定发送节点和接收节点之间的路径，其中网络逻辑设备设置数据内的一个或多个选择位和一个或多个提示位分组，用于存储一个或多个掩码的控制寄存器，其中所述网络逻辑设备使用所述一个或多个选择位从所述控制寄存器中选择掩码，并且所述网络逻辑设备将所选择的掩码应用于所述提示位以限制数据分组发送到网络内的数据分组的一个或多个路由方向，并且从一个或多个路由选择中选择一个受限制的路由方向，并沿所选路由方向的链路向接收节点发送数据分组。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式