专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20070174825A1 Apparatus and method for optimizing scalar code executed on a SIMD engine by alignment of SIMD slots 审中-公开
标题翻译：用于通过SIMD槽的对准来优化在SIMD引擎上执行的标量的装置和方法
公开(公告)号：US20070174825A1
公开(公告)日：2007-07-26
申请号：US11339591
申请日：2006-01-25
申请人： Alexandre Eichenberger , John Kevin O'Brien
发明人： Alexandre Eichenberger , John Kevin O'Brien
IPC分类号： G06F9/45
CPC分类号： G06F9/3885 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/3824
摘要： An apparatus and method for optimizing scalar code executed on a single instruction multiple data (SIMD) engine is provided that aligns the slots of SIMD registers. With the apparatus and method, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.
摘要翻译：提供了一种用于优化在单指令多数据（SIMD）引擎上执行的标量码的装置和方法，其对准SIMD寄存器的时隙。使用设备和方法，提供了一个解析源代码的编译器，对于程序中的每个语句，都会生成一个表达式树。编译器检查表达式树中的所有存储输入到标量运算，以确定它们在SIMD寄存器中的对齐。该对齐方式从树叶中向上传播。当表达式树中的两个操作数的对齐方式相同时，生成的对齐方式是共享值。当表达式树中的两个操作数的对齐不同时，一个操作数被移位。对于移位的操作数，在表达式树中插入shift操作。然后为表达式树生成可执行代码，并在指定的位置插入移位。

2. 发明申请

US20050283769A1 System and method for efficient data reorganization to satisfy data alignment constraints 失效
标题翻译：用于有效数据重组以满足数据对齐约束的系统和方法
公开(公告)号：US20050283769A1
公开(公告)日：2005-12-22
申请号：US10862483
申请日：2004-06-07
申请人： Alexandre Eichenberger , John O'Brien , Peng Wu
发明人： Alexandre Eichenberger , John O'Brien , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/4452
摘要： A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
摘要翻译：提供了一种系统和方法，用于在仅支持对齐的负载和存储的SIMD架构的编译代码中向量化未对齐的引用。在本文提出的框架中，首先简化循环，就好像内存单元不会对齐约束。然后，编译器插入数据重组操作，以满足硬件的实际对齐要求。最后，代码生成算法基于数据重组图生成SIMD代码，解决诸如运行时对齐，未知循环边界，残差迭代计数以及具有任意对齐组合的多个语句之类的现实问题。除了生成有效的simdization之外，优选实施例进一步提高了生成代码的质量。公开了四个流移放置策略，其最小化由对齐处理产生的数据重组的数量。

3. 发明申请

US20070226723A1 Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support 有权
标题翻译：在存在多线程和其他虚假共享条件的情况下以及具有存储器保护支持的机器中有效地生成SIMD代码
公开(公告)号：US20070226723A1
公开(公告)日：2007-09-27
申请号：US11358372
申请日：2006-02-21
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu , Peng Zhao
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu , Peng Zhao
IPC分类号： G06F9/45
CPC分类号： G06F9/3851 , G06F8/44
摘要： A computer implemented method, system and computer program product for automatically generating SIMD code, particularly in the presence of multi-threading and other false sharing conditions, and in machines having a segmented/virtual page memory protection system. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the scheme is applied and the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized, and the simdized targeted loop is combined with the non-simdized code.
摘要翻译：一种用于自动生成SIMD代码的计算机实现的方法，系统和计算机程序产品，特别是在存在多线程和其他假共享条件的情况下，以及具有分段/虚拟页面存储器保护系统的机器中。该方法开始于分析由目标循环访问的数据，包括至少一个语句，其中每个语句具有至少一个存储器引用，以确定存储器访问是否安全。如果存储器访问是安全的，则对象循环被简化。如果不安全，则确定是否可以应用不需要保证安全性的方案。如果可以应用这种方案，则应用该方案，并且根据该方案对目标循环进行模拟。如果不能应用这种方案，则确定填充是否合适。如果填充是合适的，则填充数据并对目标循环进行模拟。如果填充不合适，则基于用于处理边界条件的目标循环生成非模拟代码，目标循环被简化，并且模拟目标循环与非模拟代码组合。

4. 发明申请

US20070011441A1 Method and system for data-driven runtime alignment operation 审中-公开
标题翻译：数据驱动运行时对齐操作的方法和系统
公开(公告)号：US20070011441A1
公开(公告)日：2007-01-11
申请号：US11176988
申请日：2005-07-08
申请人： Alexandre Eichenberger , Michael Gschwind , Valentina Salapura , Peng Wu
发明人： Alexandre Eichenberger , Michael Gschwind , Valentina Salapura , Peng Wu
IPC分类号： G06F9/44
CPC分类号： G06F9/355 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30112 , G06F9/3013 , G06F9/3816 , G06F9/3824
摘要： A method for processing instructions and data in a processor includes steps of: preparing an input stream of data for processing in a data path in response to a first set of instructions specifying a dynamic parameter; and processing the input stream of data in the same data path in response to a second set of instructions. A common portion of a dataflow is used for preparing the input stream of data for processing in response to a first set of instructions under the control of a dynamic parameter specified by an instruction of the first set of instructions, and for operand data routing based on the instruction specification of a second set of instructions during the processing of the input stream in response to the second set of instructions.
摘要翻译：一种用于在处理器中处理指令和数据的方法，包括以下步骤：响应于指定动态参数的第一组指令，准备用于在数据路径中进行处理的输入数据流; 以及响应于第二组指令来处理相同数据路径中的输入数据流。数据流的公共部分用于在由第一组指令的指令指定的动态参数的控制下响应于第一组指令来准备用于处理的输入数据流，并且基于用于基于响应于第二组指令在输入流的处理期间的第二组指令的指令指定。

5. 发明申请

US20080010634A1 Framework for Integrated Intra- and Inter-Loop Aggregation of Contiguous Memory Accesses for SIMD Vectorization 失效
标题翻译：用于SIMD向量化的连续内存访问的集成内部和环际聚合框架
公开(公告)号：US20080010634A1
公开(公告)日：2008-01-10
申请号：US11856284
申请日：2007-09-17
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
IPC分类号： G06F9/45 , G06F15/00
CPC分类号： G06F8/4452 , G06F8/445
摘要： A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted in to virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
摘要翻译：一种用于生成在单指令多数据路径（SIMD）架构上执行的循环码的方法，计算机程序产品和信息处理系统，其中循环包含在连续的存储器流上操作的多个非步进存储器访问披露优选实施例识别在循环体内同构语句的组，其中同构语句在循环的迭代上在连续的存储器流上操作。然后将那些已识别的语句转换为虚拟长度向量操作。接下来，使用硬件的可用向量长度来确定多个虚拟长度向量以聚合到单个向量操作中，用于循环的每次迭代。最后，聚合的向量化循环码被转换成SIMD操作。

6. 发明申请

US20050283774A1 System and method for SIMD code generation in the presence of optimized misaligned data reorganization 失效
标题翻译：存在优化的未对齐数据重组的SIMD代码生成的系统和方法
公开(公告)号：US20050283774A1
公开(公告)日：2005-12-22
申请号：US10918996
申请日：2004-08-16
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/4452 , G06F8/447
摘要： A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
摘要翻译：公开了一种用于在单指令多数据路径（SIMD）架构上生成循环码来执行循环对具有不同长度的数据类型进行操作的方法，计算机程序产品和信息处理系统。此外，本发明的优选实施例包括一种用于有效地将任意流重新对准或将任意流移动到任意偏移的新技术，无论在编译时是否已知对准或偏移。这种技术使得可以将高级对齐优化应用于运行时对齐。这允许对具有不同长度的数据类型的顺序循环代码通过完全自动化的过程进行转换（“模拟化”）成优化的SIMD代码。

7. 发明申请

US20050283775A1 Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization 失效
标题翻译：用于SIMD向量化的连续存储器访问的集成的内部和组间集成的框架
公开(公告)号：US20050283775A1
公开(公告)日：2005-12-22
申请号：US10919115
申请日：2004-08-16
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/4452 , G06F8/445
摘要： A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted in to virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
摘要翻译：一种用于生成在单指令多数据路径（SIMD）架构上执行的循环码的方法，计算机程序产品和信息处理系统，其中循环包含在连续的存储器流上操作的多个非步进存储器访问披露优选实施例识别在循环体内同构语句的组，其中同构语句在循环的迭代上在连续的存储器流上操作。然后将那些已识别的语句转换为虚拟长度向量操作。接下来，使用硬件的可用向量长度来确定多个虚拟长度向量以聚合到单个向量操作中，用于循环的每次迭代。最后，聚合的向量化循环码被转换成SIMD操作。

8. 发明申请

US20050283773A1 Framework for efficient code generation using loop peeling for SIMD loop code with multiple misaligned statements 失效
标题翻译：使用循环剥离的高效代码生成框架，用于具有多个不对齐语句的SIMD循环代码
公开(公告)号：US20050283773A1
公开(公告)日：2005-12-22
申请号：US10918879
申请日：2004-08-16
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/447 , G06F8/4441
摘要： A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
摘要翻译：提供了一种系统和方法，用于在仅支持对齐的负载和存储的SIMD架构的编译代码中向量化未对齐的引用。在这个框架中，循环首先被模拟，就好像内存单元没有对齐约束。编译器然后插入数据重组操作以满足硬件的实际对齐要求。最后，代码生成算法基于数据重组图生成SIMD代码，解决诸如运行时对齐，未知循环边界，残差迭代计数以及具有任意对齐组合的多个语句之类的现实问题。循环剥离用于减少与未对齐数据相关的计算开销。循环序言和结语在模拟循环中从单独迭代中去除，向量拼接指令被应用于剥离的迭代，而稳态循环体不引起额外的计算开销。

9. 发明申请

US20050273769A1 Framework for generating mixed-mode operations in loop-level simdization 有权
标题翻译：在循环级simdization中生成混合模式操作的框架
公开(公告)号：US20050273769A1
公开(公告)日：2005-12-08
申请号：US10919005
申请日：2004-08-16
申请人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
发明人： Alexandre Eichenberger , Kai-Ting Wang , Peng Wu
IPC分类号： G06F9/45
CPC分类号： G06F8/4452
摘要： A method, computer program product, and information handling system for generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.
摘要翻译：公开了一种用于在具有向量或SIMD处理单元的处理器的程序代码的编译中产生混合模式操作的方法，计算机程序产品和信息处理系统。在本发明的优选实施例中，构成循环体的程序指令被抽象为虚拟向量指令。对于初始代码优化目的，将这些虚拟向量指令作为向量指令（即向量单元写入的指令）进行处理。虚拟向量指令最终被扩展为目标处理器的本地代码，此时，对于每个虚拟向量指令，确定是否将虚拟向量指令扩展为本地向量指令，进入本地标量指令，调用到前一个定义的库函数，或这些的组合。使用成本模型来确定基于硬件/软件约束，性能成本/效益和其他标准的最佳扩展选择。

10. 发明申请

US20050132172A1 Method and apparatus for eliminating the need for register assignment, allocation, spilling and re-filling 有权
标题翻译：无需注册分配，分配，溢出和重新填充的方法和设备
公开(公告)号：US20050132172A1
公开(公告)日：2005-06-16
申请号：US10735054
申请日：2003-12-12
申请人： Alexandre Eichenberger , Erik Altman , Sumedh Sathaye , John-David Wellman
发明人： Alexandre Eichenberger , Erik Altman , Sumedh Sathaye , John-David Wellman
IPC分类号： G06F9/30 , G06F9/318 , G06F9/38
CPC分类号： G06F9/30181 , G06F9/30076 , G06F9/30134 , G06F9/3836 , G06F9/3838 , G06F9/384 , G06F9/3855
摘要： A method and apparatus is provided to manage data in computer registers in a program, making more computer registers available to one or more programmers utilizing a name level instruction. The method and apparatus disclosed herein presents a way of reducing the overhead of register management, by introducing a concept of a name level for each of the named architected registers in a processor. The method provides a programmer with a larger register name-space while not increasing the size of the instruction word in the processor instruction-set architecture. It also provides for the facilitation of architectural features which overload the architected register namespace and ease the overhead of register management. This provides for the addition of more computer registers without changing the instruction format of the computer.
摘要翻译：提供了一种方法和装置来管理程序中的计算机寄存器中的数据，使得使用名称级别指令的一个或多个程序员可以使用更多的计算机寄存器。本文公开的方法和装置通过在处理器中引入每个命名架构寄存器的名称级别的概念来呈现减少注册管理开销的方式。该方法为编程器提供了更大的寄存器名称空间，而不增加处理器指令集架构中的指令字的大小。它还提供了对架构特征的便利化，这样就可以使架构化的寄存器命名空间过载，并且简化了寄存器管理的开销。这样可以增加更多的计算机寄存器，而不用改变计算机的指令格式。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式