会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 33. 发明授权
    • Arithmetic functions in torus and tree networks
    • 圆环和树网络中的算术函数
    • US07313582B2
    • 2007-12-25
    • US10468991
    • 2002-02-25
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • Gyan BhanotMatthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerBurkhard D. Steinmacher-BurowPavlos M. Vranas
    • G06F7/38
    • G06F15/17337
    • Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working in conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time required for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The invention involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimize the number of hops and the bidirectional capabilities of the network to reduce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer. With the method of this invention, every single element is injected into the network only once and it will be stored and forwarded without any further software overhead. In accordance with a second aspect of the invention, methods and systems are provided to efficiently implement global arithmetic operations on a network that supports the global combining operations. The latency of doing such global operations are greatly reduced by using these methods.
    • 用于执行算术功能的方法和系统。 根据本发明的第一方面,提供了方法和装置,其结合软件算法和类网络路由的硬件实现,以实现对环面上的全局算术运算所需的时间的非常显着的减少。 因此,它可以提高在大型并行机上运行的应用程序的可扩展性。 本发明涉及提高全球运营效率和准确性三个步骤:(1)在必要时确保所有节点按照相同顺序对数据进行全局运算,从而获得独立的回答,而不考虑舍入误差; (2)使用环面的拓扑来最小化跳数和网络的双向能力,将数据传输操作中的时间步数减少到绝对最小值; 和(3)使用类函数路由来减少数据传输中的延迟。 利用本发明的方法,每个单个元件仅被注入到网络中一次,并且它将被存储和转发而没有任何进一步的软件开销。 根据本发明的第二方面,提供了用于在支持全局组合操作的网络上有效地实现全局算术运算的方法和系统。 通过使用这些方法大大减少了进行这种全局操作的延迟。
    • 34. 发明授权
    • Method and apparatus for efficiently tracking queue entries relative to a timestamp
    • 相对于时间戳有效跟踪队列条目的方法和装置
    • US08756350B2
    • 2014-06-17
    • US11768800
    • 2007-06-26
    • Matthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerMartin OhmachtValentina SalapuraPavlos Vranas
    • Matthias A. BlumrichDong ChenAlan G. GaraMark E. GiampapaPhilip HeidelbergerMartin OhmachtValentina SalapuraPavlos Vranas
    • G06F3/00G06F5/00
    • G06F12/0835G06F12/0831
    • An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.
    • 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。 该装置包括相干逻辑单元,每个单元具有多个队列结构,每个队列结构与在系统中传输的事件信号的相应发送者相关联。 与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队,并且计数器跟踪队列结构中剩余入队的多个相干事件信号,并且从接收到时间戳信号起出队。 计数器机构产生一个输出信号,指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。 在一个实施例中,时间戳信号在存储器同步操作的开始被断言,并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。 然后可以将该信号用作存储器同步操作的完成条件的一部分。