会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 42. 发明申请
    • Reducing Bandwidth Requirements for Matrix Multiplication
    • 降低矩阵乘法的带宽要求
    • US20090300091A1
    • 2009-12-03
    • US12129789
    • 2008-05-30
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • G06F7/52
    • G06F17/16
    • A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism per forms sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    • 提供了一种块矩阵乘法机制,用于在数据处理系统中执行块矩阵乘法运算时,使拐角处的块的访问次序反转。 该机制增加了块大小,并将每个块划分为子块。 通过反转巡视顺序,该机构消除了拐角处的子块负载。 每个子块在给定块中的每个子块矩阵乘法的机制,然后重复下一个块的操作,直到计算所有块。 该机制可以确定块大小和子块大小以优化负载平衡和存储器带宽。 因此,该机制降低了最大吞吐量并提高了性能。 此外,该机制还减少了多缓冲本地存储缓冲区的数量。
    • 48. 发明授权
    • Reducing bandwidth requirements for matrix multiplication
    • 减少矩阵乘法的带宽要求
    • US08250130B2
    • 2012-08-21
    • US12129789
    • 2008-05-30
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • Daniel A. BrokenshireJohn A. GunnelsMichael D. Kistler
    • G06F17/16
    • G06F17/16
    • A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism performs sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    • 提供了一种块矩阵乘法机制,用于在数据处理系统中执行块矩阵乘法运算时,使拐角处的块的访问次序反转。 该机制增加了块大小,并将每个块划分为子块。 通过反转巡视顺序,该机构消除了拐角处的子块负载。 该机制对给定块中的每个子块执行子块矩阵乘法,然后重复下一个块的操作,直到计算所有块。 该机制可以确定块大小和子块大小以优化负载平衡和存储器带宽。 因此,该机制降低了最大吞吐量并提高了性能。 此外,该机制还减少了多缓冲本地存储缓冲区的数量。