会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 82. 发明申请
    • Performing an Allreduce Operation Using Shared Memory
    • 使用共享内存执行Allreduce操作
    • US20080301683A1
    • 2008-12-04
    • US11754782
    • 2007-05-29
    • Charles J. ArcherGabor DozsaJoseph D. RattermanBrian E. Smith
    • Charles J. ArcherGabor DozsaJoseph D. RattermanBrian E. Smith
    • G06F9/46
    • G06F9/4843G06F9/52G06F9/546
    • Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
    • 公开了用于使用共享存储器执行全部还原操作的方法,装置和产品,其包括:由计算节点上的多个处理核心中的至少一个接收执行全部降低操作的指令; 通过所述接收到所述指令的核心建立用于指定多个共享存储器全部还原工作单元的作业状态对象,所述多个共享存储器全部还原工作单元一起在所述计算节点上执行全部还原操作; 通过所述计算节点上的可用核确定所述作业状态对象中的下一个共享存储器allreduce工作单元; 并且通过计算节点上的可用核心执行下一个共享存储器allreduce工作单元。
    • 83. 发明申请
    • Low Latency, High Bandwidth Data Communications Between Compute Nodes in a Parallel Computer
    • 并行计算机中计算节点之间的低延迟,高带宽数据通信
    • US20080281997A1
    • 2008-11-13
    • US11746333
    • 2007-05-09
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • G06F13/28
    • G06F13/4269
    • Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (‘DMA’) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (‘RTS’) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
    • 公开了并行计算机和计算机程序产品的方法,用于并行计算机中的计算节点之间的低延迟,高带宽数据通信。 实施例包括通过原始计算节点的原始直接存储器访问(“DMA”)引擎接收用于传送到目标计算节点的数据; 由原始计算节点的原始DMA引擎发送到目标计算节点上的目标DMA引擎,发送('RTS')消息的请求; 由原始DMA引擎使用存储器FIFO操作将预定部分的数据传送到目标计算节点; 由原始DMA引擎确定是否从目标DMA引擎接收到RTS消息的确认; 如果尚未接收到RTS消息的确认,则由原始DMA引擎使用存储器FIFO操作将另一预定部分的数据传送到目标计算节点; 并且如果原始DMA引擎已经接收到RTS消息的确认,则由原始DMA引擎使用直接放置操作将数据的剩余部分传送到目标计算节点。
    • 87. 发明授权
    • Send-side matching of data communications messages
    • 数据通信消息的发送端匹配
    • US08776081B2
    • 2014-07-08
    • US12881863
    • 2010-09-14
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • G06F13/00G06F15/16G06F15/173G06F9/54G06F9/46
    • G06F9/546G06F9/46G06F9/52G06F15/16G06F15/17312
    • Send-side matching of data communications messages includes a plurality of compute nodes organized for collective operations, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message.
    • 数据通信消息的发送侧匹配包括为集体操作组织的多个计算节点,包括:由接收节点向源节点发出指定从任何源节点发送的单个消息的接收的接收消息,接收消息 包括消息匹配信息,硬件级互斥设备的规范以及接收缓冲器的标识; 由两个或多个源节点匹配接收消息与两个或多个源节点中的待发送消息; 由具有匹配发送消息的源节点之一的互斥设备操作,排除来自具有匹配发送消息的其他源节点的消息,并且向接收节点标识操作互斥设备的源节点; 以及从所述源节点向所述接收节点发送操作所述互斥设备匹配的等待消息。
    • 89. 发明授权
    • Performing an allreduce operation using shared memory
    • 使用共享内存执行allreduce操作
    • US08752051B2
    • 2014-06-10
    • US13427057
    • 2012-03-22
    • Charles J. ArcherGabor DozsaJoseph D. RattermanBrian E. Smith
    • Charles J. ArcherGabor DozsaJoseph D. RattermanBrian E. Smith
    • G06F9/46G06F9/48G06F9/52
    • G06F9/4843G06F9/52G06F9/546
    • Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
    • 公开了用于使用共享存储器执行全部还原操作的方法,装置和产品,其包括:由计算节点上的多个处理核心中的至少一个接收执行全部降低操作的指令; 通过所述接收到所述指令的核心建立用于指定多个共享存储器全部还原工作单元的作业状态对象,所述多个共享存储器全部还原工作单元一起在所述计算节点上执行全部还原操作; 通过所述计算节点上的可用核确定所述作业状态对象中的下一个共享存储器allreduce工作单元; 并且通过计算节点上的可用核心执行下一个共享存储器allreduce工作单元。
    • 90. 发明申请
    • Optimizing Collective Communications Within A Parallel Computer
    • 并行计算机内集体通信优化
    • US20140047451A1
    • 2014-02-13
    • US13569614
    • 2012-08-08
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • Charles J. ArcherMichael A. BlocksomeJoseph D. RattermanBrian E. Smith
    • G06F9/46
    • G06F9/5061G06F2209/505
    • Methods, apparatuses, and computer program products for optimizing collective communications within a parallel computer comprising a plurality of hardware threads for executing software threads of a parallel application are provided. Embodiments include a processor of a parallel computer determining for each software thread, an affinity of the software thread to a particular hardware thread. Each affinity indicates an assignment of a software thread to a particular hardware thread. The processor also generates one or more affinity domains based on the affinities of the software threads. Embodiments also include a processor generating, for each affinity domain, a topology of the affinity domain based on the affinities of the software threads to the hardware threads. According to embodiments of the present application, a processor also performs, based on the generated topologies of the affinity domains, a collective operation on one or more software threads.
    • 提供了用于优化并行计算机内的集体通信的方法,装置和计算机程序产品,其包括用于执行并行应用的软件线程的多个硬件线程。 实施例包括并行计算机的处理器,为每个软件线程确定软件线程与特定硬件线程的亲和度。 每个相关性表示将软件线程分配给特定的硬件线程。 处理器还基于软件线程的亲和性生成一个或多个关联域。 实施例还包括基于软件线程对​​硬件线程的亲和性,针对每个关联域产生兴趣域的拓扑的处理器。 根据本申请的实施例,处理器还基于所生成的关联域的拓扑来执行对一个或多个软件线程的集合操作。