专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

41. 发明授权

US07844630B2 Method and structure for fast in-place transformation of standard full and packed matrix data formats 失效
标题翻译：用于快速就地转换标准完整和打包矩阵数据格式的方法和结构
公开(公告)号：US07844630B2
公开(公告)日：2010-11-30
申请号：US12033581
申请日：2008-02-19
申请人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
发明人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
IPC分类号： G06F17/00
CPC分类号： G06F17/16 , G06F7/76
摘要： A computerized method provides for an in-place transformation of matrix A data including a New Data Structure (NDS) format and a transformation T having a compact representation. The NDS represents data of the matrix A in a format other than a row major format or a column major format, such that the data for the matrix A is stored as contiguous sub matrices of size MB by NB in an order predetermined to provide the data for a matrix processing. The transformation T is applied to the MB by NB blocks, using an in-place transformation processing, thereby replacing data of the block A1 with the contents of T(A1).
摘要翻译：计算机化方法提供包括新数据结构（NDS）格式的矩阵A数据和具有紧凑表示的变换T的就地变换。 NDS以除主行格式或列主格式之外的格式表示矩阵A的数据，使得矩阵A的数据以预定的顺序被存储为大小为MB的连续子矩阵，以提供数据用于矩阵处理。使用就地变换处理，通过NB块将变换T应用于MB，由此用T（A1）的内容替换块A1的数据。

42. 发明申请

US20090300091A1 Reducing Bandwidth Requirements for Matrix Multiplication 有权
标题翻译：降低矩阵乘法的带宽要求
公开(公告)号：US20090300091A1
公开(公告)日：2009-12-03
申请号：US12129789
申请日：2008-05-30
申请人： Daniel A. Brokenshire , John A. Gunnels , Michael D. Kistler
发明人： Daniel A. Brokenshire , John A. Gunnels , Michael D. Kistler
IPC分类号： G06F7/52
CPC分类号： G06F17/16
摘要： A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism per forms sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
摘要翻译：提供了一种块矩阵乘法机制，用于在数据处理系统中执行块矩阵乘法运算时，使拐角处的块的访问次序反转。该机制增加了块大小，并将每个块划分为子块。通过反转巡视顺序，该机构消除了拐角处的子块负载。每个子块在给定块中的每个子块矩阵乘法的机制，然后重复下一个块的操作，直到计算所有块。该机制可以确定块大小和子块大小以优化负载平衡和存储器带宽。因此，该机制降低了最大吞吐量并提高了性能。此外，该机制还减少了多缓冲本地存储缓冲区的数量。

43. 发明申请

US20090150615A1 METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING STREAMING 失效
标题翻译：使用流水线生产高性能线性代数序列的方法和结构
公开(公告)号：US20090150615A1
公开(公告)日：2009-06-11
申请号：US12348869
申请日：2009-01-05
申请人： Fred Gehrung Gustavson , John A. Gunnels
发明人： Fred Gehrung Gustavson , John A. Gunnels
IPC分类号： G06F12/08 , G06F17/16
CPC分类号： G06F17/16
摘要： A method (and structure) for executing a linear algebra subroutine on a computer having a cache, includes streaming data for matrices involved in processing the linear algebra subroutine such that data is processed using data for a first matrix stored in the cache as a matrix format and data from a second matrix and a third matrix is stored in a memory device at a higher level than the cache, the streaming providing data from the higher level as the streaming data is required for the processing.
摘要翻译：一种用于在具有高速缓存的计算机上执行线性代数子程序的方法（和结构）包括用于处理线性代数子例程所涉及的矩阵的流数据，使得使用存储在高速缓存中的第一矩阵的数据作为矩阵格式来处理数据并且来自第二矩阵和第三矩阵的数据以比高速缓存更高的级别存储在存储器设备中，当处理需要流数据时，流提供来自较高级的数据。

44. 发明申请

US20090144744A1 Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems 审中-公开
标题翻译：多核处理系统中算法任务和动态参数化的性能评估
公开(公告)号：US20090144744A1
公开(公告)日：2009-06-04
申请号：US11947156
申请日：2007-11-29
申请人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton
发明人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton
IPC分类号： G06F9/50 , G06F9/44
CPC分类号： G06F9/4881 , G06F11/3414 , G06F11/3447 , G06F2201/88 , G06F2209/483 , G06F2209/485
摘要： A method for evaluating performance of DMA-based algorithmic tasks on a target multi-core processing system includes the steps of: inputting a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; evaluating performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and providing results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
摘要翻译：一种用于评估目标多核处理系统上基于DMA的算法任务的性能的方法包括以下步骤：输入用于指定任务的模板，该模板包括指定要执行的DMA操作和计算操作的DMA相关参数; 通过在目标多核处理系统上运行基准来评估指定任务的性能，该基准用于使用DMA操作生成数据访问模式，并调用由输入模板指定的规定的计算例程; 并提供表示与目标多核处理系统对应的指定任务的性能度量的基准测试结果。

45. 发明申请

US20090063529A1 METHOD AND STRUCTURE FOR FAST IN-PLACE TRANSFORMATION OF STANDARD FULL AND PACKED MATRIX DATA FORMATS 失效
标题翻译：标准完整和包装矩阵数据格式的快速插入转换的方法和结构
公开(公告)号：US20090063529A1
公开(公告)日：2009-03-05
申请号：US12033581
申请日：2008-02-19
申请人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
发明人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
IPC分类号： G06F17/30
CPC分类号： G06F17/16 , G06F7/76
摘要： A computerized method provides for an in-place transformation of matrix A data including a New Data Structure (NDS) format and a transformation T having a compact representation. The NDS represents data of the matrix A in a format other than a row major format or a column major format, such that the data for the matrix A is stored as contiguous sub matrices of size MB by NB in an order predetermined to provide the data for a matrix processing. The transformation T is applied to the MB by NB blocks, using an in-place transformation processing, thereby replacing data of the block A1 with the contents of T(A1).
摘要翻译：计算机化方法提供包括新数据结构（NDS）格式的矩阵A数据和具有紧凑表示的变换T的就地变换。 NDS以除主行格式或列主格式之外的格式表示矩阵A的数据，使得矩阵A的数据以预定的顺序被存储为大小为MB的连续子矩阵，以提供数据用于矩阵处理。使用就地变换处理，通过NB块将变换T应用于MB，由此用T（A1）的内容替换块A1的数据。

46. 发明申请

US20080313441A1 METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING REGISTER BLOCK DATA FORMAT ROUTINES 失效
标题翻译：使用寄存器块数据格式化程序生产高性能线性代数序列的方法和结构
公开(公告)号：US20080313441A1
公开(公告)日：2008-12-18
申请号：US12196095
申请日：2008-08-21
申请人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
发明人： Fred Gehrung Gustavson , John A. Gunnels , James C. Sexton
IPC分类号： G06F9/48
CPC分类号： G06F12/0875 , G06F17/16
摘要： A method (and structure) of executing a matrix operation, includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks is stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A.
摘要翻译：执行矩阵运算的方法（和结构）包括对于矩阵A，将矩阵A分成块，每个块具有大小p-by-q。然后以p-by-q的大小的块以以下两种方式中的至少一种存储在高速缓存或存储器中。至少一个块中的元素以块的元素占据与块中的原始位置不同的位置的格式存储，和/或大小为p-by-q的块以其中至少一个块占据与矩阵A中其原始位置不同的位置。

47. 发明授权

US08645447B2 Method and structure for cache aware transposition via rectangular subsections 失效
标题翻译：通过矩形子部分缓存感知转置的方法和结构
公开(公告)号：US08645447B2
公开(公告)日：2014-02-04
申请号：US11035933
申请日：2005-01-14
申请人： Fred Gehrung Gustavson , John A. Gunnels
发明人： Fred Gehrung Gustavson , John A. Gunnels
IPC分类号： G06F17/14
CPC分类号： G06F7/78
摘要： A method and structure for transposing a rectangular matrix A in a computer includes subdividing the rectangular matrix A into one or more square submatrices and executing an in-place transposition for each of the square submatrices Aij.
摘要翻译：用于在计算机中转置矩形矩阵A的方法和结构包括将矩形矩阵A细分为一个或多个平方子矩阵，并为每个正方形子矩阵Aij执行就地转置。

48. 发明授权

US08250130B2 Reducing bandwidth requirements for matrix multiplication 有权
标题翻译：减少矩阵乘法的带宽要求
公开(公告)号：US08250130B2
公开(公告)日：2012-08-21
申请号：US12129789
申请日：2008-05-30
申请人： Daniel A. Brokenshire , John A. Gunnels , Michael D. Kistler
发明人： Daniel A. Brokenshire , John A. Gunnels , Michael D. Kistler
IPC分类号： G06F17/16
CPC分类号： G06F17/16
摘要： A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism performs sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
摘要翻译：提供了一种块矩阵乘法机制，用于在数据处理系统中执行块矩阵乘法运算时，使拐角处的块的访问次序反转。该机制增加了块大小，并将每个块划分为子块。通过反转巡视顺序，该机构消除了拐角处的子块负载。该机制对给定块中的每个子块执行子块矩阵乘法，然后重复下一个块的操作，直到计算所有块。该机制可以确定块大小和子块大小以优化负载平衡和存储器带宽。因此，该机制降低了最大吞吐量并提高了性能。此外，该机制还减少了多缓冲本地存储缓冲区的数量。

49. 发明授权

US08037215B2 Performance evaluation of algorithmic tasks and dynamic parameterization on multi-core processing systems 有权
标题翻译：算法任务的性能评估和多核处理系统的动态参数化
公开(公告)号：US08037215B2
公开(公告)日：2011-10-11
申请号：US12130167
申请日：2008-05-30
申请人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton
发明人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton
IPC分类号： G06F13/28 , G06F17/50
CPC分类号： G06F11/3404 , G06F11/3428 , G06F11/3433 , G06F11/3447
摘要： Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.
摘要翻译：用于评估目标多核处理系统上基于DMA的算法任务的性能的装置包括存储器和耦合到存储器的至少一个处理器。处理器是可操作的：输入指定任务的模板，该模板包括指定DMA操作的DMA相关参数和要执行的计算操作; 通过在目标多核处理系统上运行基准测试来评估指定任务的性能，该基准测试用于使用DMA操作生成数据访问模式，并调用由输入模板指定的规定的计算例程; 并提供表示与目标多核处理系统相对应的指定任务的性能度量的基准测试结果。

50. 发明授权

US07853820B2 System and method for detecting a faulty object in a system 有权
标题翻译：用于检测系统中故障对象的系统和方法
公开(公告)号：US07853820B2
公开(公告)日：2010-12-14
申请号：US12256355
申请日：2008-10-22
申请人： John A. Gunnels , Fred Gehrung Gustavson , Robert Daniel Engle
发明人： John A. Gunnels , Fred Gehrung Gustavson , Robert Daniel Engle
IPC分类号： G06F11/00
CPC分类号： G06F11/0751
摘要： A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.
摘要翻译：一种用于检测包括在n维体系结构中彼此通信的多个对象的系统中的至少一个故障对象的方法（和系统），包括：探测n维体系结构中的第一对象平面，并至少探测在n维体系结构中的另一个对象平面将导致识别系统中的故障对象。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式