会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 93. 发明申请
    • SNOOP FILTERING SYSTEM IN A MULTIPROCESSOR SYSTEM
    • SNOOP过滤系统在多处理器系统中的应用
    • US20080222364A1
    • 2008-09-11
    • US12126674
    • 2008-05-23
    • Matthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk I. HoenickeMartin OhmachtValentina SalapuraPavlos M. Vranas
    • Matthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerDirk I. HoenickeMartin OhmachtValentina SalapuraPavlos M. Vranas
    • G06F12/08
    • G06F12/0831G06F12/0813Y02D10/13
    • A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.
    • 一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法,每个单元具有与其可操作耦合的相关联的高速缓存存储器系统 该系统包括多个互连的窥探过滤器单元,每个窥探过滤器单元对应于相应处理单元并与其通信,每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备,用于处理在其上接收的窥探请求,并且转发请求之一或者阻止将请求转发到其相关联的处理单元。 多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件,每个并行操作子滤波器元件同时接收相同的窥探请求,并且实现一个或多个不同的窥探滤波器算法,用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求 并且防止将这些请求转发到处理器单元。 以这种方式,减少了转发到处理单元的多个窥探请求,从而增加了计算环境的性能。
    • 94. 发明授权
    • Optimizing layout of an application on a massively parallel supercomputer
    • 在大型并行超级计算机上优化应用程序的布局
    • US08117288B2
    • 2012-02-14
    • US10963101
    • 2004-10-12
    • Gyan V. BhanotAlan GaraPhilip HeidelbergerEoin M. LawlessJames C. SextonRobert E. Walkup
    • Gyan V. BhanotAlan GaraPhilip HeidelbergerEoin M. LawlessJames C. SextonRobert E. Walkup
    • G06F15/177
    • G06F9/5066
    • A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)− where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
    • 描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。 该方法采用数组形式的任意问题的通信矩阵作为输入,其条目C(i,j)是从域i到域j传送的数据量。 给定C(i,j),首先实现启发式映射,其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点,同时保持域的数量映射到 超级计算机节点常数(尽可能多)。 接下来,使用具有自由能的蒙特卡罗模拟(成本函数)F =&Sgr; i,jC(i,j)H(i,j),从初始映射生成马尔科夫链映射。其中H(i,j) 域i和域j之间的超级计算机环面上的最小跳数。 在测试的情况下,发现该方法产生良好的映射,并且有可能被用作并行代码的通用布局优化工具。 此时,实现测试方法的序列号未优化,以便在典型的PC上找到最佳映射的计算时间可以为几个小时。 对于生产实现,将需要我们的算法的良好的并行代码,这本身可以在超级计算机上实现。
    • 95. 发明申请
    • Optimizing layout of an application on a massively parallel supercomputer
    • 在大型并行超级计算机上优化应用程序的布局
    • US20060101104A1
    • 2006-05-11
    • US10963101
    • 2004-10-12
    • Gyan BhanotAlan GaraPhilip HeidelbergerEoin LawlessJames SextonRobert Walkup
    • Gyan BhanotAlan GaraPhilip HeidelbergerEoin LawlessJames SextonRobert Walkup
    • G06F1/16
    • G06F9/5066
    • A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.
    • 描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。 该方法采用数组形式的任意问题的通信矩阵作为输入,其条目C(i,j)是从域i到域j传送的数据量。 给定C(i,j),首先实现启发式映射,其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点,同时保持域的数量映射到 超级计算机节点常数(尽可能多)。 接下来,使用具有自由能量(成本函数)的蒙特卡罗模拟,从初始映射生成马尔可夫链映射,其中F =Σi,j C(i,j)H(i,j) H(i,j)是域i和域j之间的超级计算机环面上的最小跳数。 在测试的情况下,发现该方法产生良好的映射,并且有可能被用作并行代码的通用布局优化工具。 此时,实现测试方法的序列号未优化,以便在典型的PC上找到最佳映射的计算时间可以为几个小时。 对于生产实现,将需要我们的算法的良好的并行代码,这本身可以在超级计算机上实现。
    • 98. 发明授权
    • Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment
    • 增加可用的FIFO空间,以防止DMA环境中的消息队列死锁
    • US08112559B2
    • 2012-02-07
    • US12241634
    • 2008-09-30
    • Michael A. BlocksomeDong ChenThomas GoodingPhilip HeidelbergerJeff Parker
    • Michael A. BlocksomeDong ChenThomas GoodingPhilip HeidelbergerJeff Parker
    • G06F13/28G06F15/167
    • G06F13/28
    • Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
    • 本发明的实施例可以用于在并行计算环境中管理消息队列以防止消息队列死锁。 计算节点的直接存储器访问控制器可以确定消息队列何时已满。 作为响应,DMA可能会产生中断。 中断处理程序可能会停止DMA,并将所有描述符从完整消息队列交换到更大的队列(或放大原始队列)。 然后中断处理程序重新启动DMA。 或者,中断处理程序停止DMA,分配存储块来保存队列数据,然后将描述符从完整消息队列移动到分配的内存块中。 然后中断处理程序重新启动DMA。 在正常消息传递提前周期期间,消息收发管理器尝试将描述符注入到其他消息队列中,直到描述符全部被处理。
    • 99. 发明申请
    • I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK
    • 多维多功能网络中的I / O路由
    • US20110173349A1
    • 2011-07-14
    • US12697175
    • 2010-01-29
    • Dong ChenNoel A. EisleyPhilip Heidelberger
    • Dong ChenNoel A. EisleyPhilip Heidelberger
    • G06F3/00
    • H04L45/00G06F15/17387H04L45/06
    • A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
    • 公开了一种方法,系统和计算机程序产品,用于在包括多个计算节点的多维环面计算节点网络的计算系统中路由数据分组,以及包括多个I / O节点的I / O节点网络。 在一个实施例中,该方法包括向每个数据分组分配标识计算节点之一的目的地地址; 为每个数据分组提供一个toio值; 将数据包通过计算节点网络路由到数据包的目的地址; 并且当每个数据分组到达分配给所述每个数据分组的目的地地址时,如果所述每个数据分组的toio值是指定值,则将所述每个数据分组路由到所述I / O节点之一。 在一个实施例中,每个数据分组还具有用于通过计算节点网络路由数据分组的ioreturn值。
    • 100. 发明申请
    • ZONE ROUTING IN A TORUS NETWORK
    • 多功能网络中的区域路由
    • US20110173343A1
    • 2011-07-14
    • US12684184
    • 2010-01-08
    • Dong ChenPhilip HeidelbergerSameer Kumar
    • Dong ChenPhilip HeidelbergerSameer Kumar
    • G06F15/173
    • G06F15/17381
    • A system for routing data in a network comprising a network logic device at a sending node for determining a path between the sending node and a receiving node, wherein the network logic device sets one or more selection bits and one or more hint bits within the data packet, a control register for storing one or more masks, wherein the network logic device uses the one or more selection bits to select a mask from the control register and the network logic device applies the selected mask to the hint bits to restrict routing of the data packet to one or more routing directions for the data packet within the network and selects one of the restricted routing directions from the one or more routing directions and sends the data packet along a link in the selected routing direction toward the receiving node.
    • 一种用于在网络中路由数据的系统,包括在发送节点处的网络逻辑设备,用于确定发送节点和接收节点之间的路径,其中网络逻辑设备设置数据内的一个或多个选择位和一个或多个提示位 分组,用于存储一个或多个掩码的控制寄存器,其中所述网络逻辑设备使用所述一个或多个选择位从所述控制寄存器中选择掩码,并且所述网络逻辑设备将所选择的掩码应用于所述提示位以限制 数据分组发送到网络内的数据分组的一个或多个路由方向,并且从一个或多个路由选择中选择一个受限制的路由方向,并沿所选路由方向的链路向接收节点发送数据分组。