专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

41. 发明申请

US20130024648A1 TLB EXCLUSION RANGE 有权
标题翻译： TLB排除范围
公开(公告)号：US20130024648A1
公开(公告)日：2013-01-24
申请号：US13618730
申请日：2012-09-14
申请人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow
发明人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow
IPC分类号： G06F12/10
CPC分类号： G06F12/1027 , G06F2212/652 , G06F2212/654
摘要： A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
摘要翻译：提供了一种访问存储器的系统和方法。该系统包括用于存储一个或多个页表条目的查找缓冲器，其中所述一个或多个页表条目中的每一个包括至少虚拟页码和物理页号; 用于从所述处理器接收虚拟地址的逻辑电路，所述逻辑电路用于将所述虚拟地址与所述页表项之一中的虚拟页号进行匹配，以选择所述同一页表项中的所述物理页号，所述页表项具有一个或多个位被设置为从页面排除存储器范围。

42. 发明申请

US20110219215A1 ATOMICITY: A MULTI-PRONGED APPROACH 审中-公开
标题翻译：原理：多方面的方法
公开(公告)号：US20110219215A1
公开(公告)日：2011-09-08
申请号：US13008546
申请日：2011-01-18
申请人： Matthias A. Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow
发明人： Matthias A. Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow
IPC分类号： G06F9/30
CPC分类号： G06F9/524 , G06F12/08
摘要： In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.
摘要翻译：在具有推测性执行的多处理器系统中，可以以几种方式逼近原子性。一种方法是具有实现多种功能并保证完成的原子指令。另一种方法是将代码块分组成一起成功或失败。系统可以包含多种这样的方法。在实施多种方法时，系统可以优先考虑其他方法。当通过高速缓冲存储器中的目录查找完成冲突检测时，原子指令和原子性相关操作可以在该高速缓冲存储器中的高速缓存数据阵列访问流水线中实现。该实现可以包括用于在原子指令内实现多个功能并且还用于级联原子指令的流水线的反馈。

43. 发明申请

US20110219208A1 MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER 有权
标题翻译：多层高效平行超级计算机
公开(公告)号：US20110219208A1
公开(公告)日：2011-09-08
申请号：US13004007
申请日：2011-01-10
申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
IPC分类号： G06F15/76 , G06F9/06
CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14
摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译：具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

44. 发明授权

US07620841B2 Re-utilizing partially failed resources as network resources 失效
标题翻译：重新利用部分失败的资源作为网络资源
公开(公告)号：US07620841B2
公开(公告)日：2009-11-17
申请号：US11335784
申请日：2006-01-19
申请人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Alan Liebsch , Burkhard Steinmacher-Burow , Pavlos Michael Vranas
发明人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Alan Liebsch , Burkhard Steinmacher-Burow , Pavlos Michael Vranas
IPC分类号： G06F11/00
CPC分类号： G06F11/0793 , G06F11/0724
摘要： A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.
摘要翻译：在大规模并行的超级计算机系统中重新利用部分失败的计算资源的方法和装置。在优选实施例中，计算节点包括可以单独使能的多个时钟域。当检测到计算节点中的错误，并且故障不在网络通信块中时，时钟使能电路仅允许网络通信块的时钟允许部分失败的计算节点被重新利用为网络资源。然后，计算机系统可以继续操作，性能略有降低，从而提高性能和可察觉的整体可靠性。

45. 发明申请

US20080104367A1 Collective Network For Computer Structures 有权
标题翻译：计算机结构集体网
公开(公告)号：US20080104367A1
公开(公告)日：2008-05-01
申请号：US11572372
申请日：2005-07-18
申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas
IPC分类号： G06F15/80 , G06F9/30
CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641
摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
摘要翻译：一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

46. 发明授权

US08117288B2 Optimizing layout of an application on a massively parallel supercomputer 失效
标题翻译：在大型并行超级计算机上优化应用程序的布局
公开(公告)号：US08117288B2
公开(公告)日：2012-02-14
申请号：US10963101
申请日：2004-10-12
申请人： Gyan V. Bhanot , Alan Gara , Philip Heidelberger , Eoin M. Lawless , James C. Sexton , Robert E. Walkup
发明人： Gyan V. Bhanot , Alan Gara , Philip Heidelberger , Eoin M. Lawless , James C. Sexton , Robert E. Walkup
IPC分类号： G06F15/177
CPC分类号： G06F9/5066
摘要： A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)− where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
摘要翻译：描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。该方法采用数组形式的任意问题的通信矩阵作为输入，其条目C（i，j）是从域i到域j传送的数据量。给定C（i，j），首先实现启发式映射，其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点，同时保持域的数量映射到超级计算机节点常数（尽可能多）。接下来，使用具有自由能的蒙特卡罗模拟（成本函数）F =＆Sgr; i，jC（i，j）H（i，j），从初始映射生成马尔科夫链映射。其中H（i，j）域i和域j之间的超级计算机环面上的最小跳数。在测试的情况下，发现该方法产生良好的映射，并且有可能被用作并行代码的通用布局优化工具。此时，实现测试方法的序列号未优化，以便在典型的PC上找到最佳映射的计算时间可以为几个小时。对于生产实现，将需要我们的算法的良好的并行代码，这本身可以在超级计算机上实现。

47. 发明申请

US20060101104A1 Optimizing layout of an application on a massively parallel supercomputer 失效
标题翻译：在大型并行超级计算机上优化应用程序的布局
公开(公告)号：US20060101104A1
公开(公告)日：2006-05-11
申请号：US10963101
申请日：2004-10-12
申请人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup
发明人： Gyan Bhanot , Alan Gara , Philip Heidelberger , Eoin Lawless , James Sexton , Robert Walkup
IPC分类号： G06F1/16
CPC分类号： G06F9/5066
摘要： A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
摘要翻译：描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。该方法采用数组形式的任意问题的通信矩阵作为输入，其条目C（i，j）是从域i到域j传送的数据量。给定C（i，j），首先实现启发式映射，其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点，同时保持域的数量映射到超级计算机节点常数（尽可能多）。接下来，使用具有自由能量（成本函数）的蒙特卡罗模拟，从初始映射生成马尔可夫链映射，其中F =Σi，j C（i，j）H（i，j） H（i，j）是域i和域j之间的超级计算机环面上的最小跳数。在测试的情况下，发现该方法产生良好的映射，并且有可能被用作并行代码的通用布局优化工具。此时，实现测试方法的序列号未优化，以便在典型的PC上找到最佳映射的计算时间可以为几个小时。对于生产实现，将需要我们的算法的良好的并行代码，这本身可以在超级计算机上实现。

48. 发明授权

US09137098B2 T-Star interconnection network topology 有权
标题翻译： T星互连网络拓扑
公开(公告)号：US09137098B2
公开(公告)日：2015-09-15
申请号：US13584300
申请日：2012-08-13
申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
IPC分类号： H04L12/715 , H04L12/24
CPC分类号： H04L41/0663 , H04L41/12 , H04L41/145 , H04L45/04
摘要： According to one embodiment of the present invention, a method of constructing network communication for a grid of node groups is provided, the grid including an M dimensional grid, each node group including N nodes, wherein M is greater than or equal to one and N is greater than one, wherein each node includes a router. The method includes directly connecting each node in each node group to every other node in the node group via intra-group links and directly connecting each node in each node group of the M dimensional grid to a node in each neighboring node group in the M dimensional grid via inter-group links.
摘要翻译：根据本发明的一个实施例，提供了一种为节点组网格构建网络通信的方法，所述网格包括M维网格，每个节点组包括N个节点，其中M大于或等于1，并且N 大于1，其中每个节点包括路由器。该方法包括通过组内链路将每个节点组中的每个节点直接连接到节点组中的每个其他节点，并将M维网格的每个节点组中的每个节点直接连接到M维中的每个相邻节点组中的节点网格通过组间链接。

49. 发明申请

US20140044015A1 T-STAR INTERCONNECTION NETWORK TOPOLOGY 审中-公开
标题翻译： T-STAR互联网络拓扑
公开(公告)号：US20140044015A1
公开(公告)日：2014-02-13
申请号：US13584300
申请日：2012-08-13
申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Philip Heidelberger , Robert M. Senger , Yutaka Sugawara
IPC分类号： H04L12/28
CPC分类号： H04L41/0663 , H04L41/12 , H04L41/145 , H04L45/04
摘要： According to one embodiment of the present invention, a method of constructing network communication for a grid of node groups is provided, the grid including an M dimensional grid, each node group including N nodes, wherein M is greater than or equal to one and N is greater than one, wherein each node includes a router. The method includes directly connecting each node in each node group to every other node in the node group via intra-group links and directly connecting each node in each node group of the M dimensional grid to a node in each neighboring node group in the M dimensional grid via inter-group links.
摘要翻译：根据本发明的一个实施例，提供了一种为节点组网格构建网络通信的方法，所述网格包括M维网格，每个节点组包括N个节点，其中M大于或等于1，并且N 大于1，其中每个节点包括路由器。该方法包括通过组内链路将每个节点组中的每个节点直接连接到节点组中的每个其他节点，并将M维网格的每个节点组中的每个节点直接连接到M维中的每个相邻节点组中的节点网格通过组间链接。

50. 发明授权

US08255638B2 Snoop filter for filtering snoop requests 失效
标题翻译：用于过滤窥探请求的Snoop过滤器
公开(公告)号：US08255638B2
公开(公告)日：2012-08-28
申请号：US12113262
申请日：2008-05-01
申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas
IPC分类号： G06F12/00 , G06F13/00
CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13
摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要翻译：一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式