专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07916864B2 Graphics processing unit used for cryptographic processing 有权
标题翻译：用于加密处理的图形处理单元
公开(公告)号：US07916864B2
公开(公告)日：2011-03-29
申请号：US11350137
申请日：2006-02-08
申请人： Norbert Juffa
发明人： Norbert Juffa
IPC分类号： H04N7/167 , G06F17/00 , G06F15/76
CPC分类号： G06F21/72 , G06F9/30181 , G06F9/3879 , G06F2207/3824
摘要： A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.
摘要翻译：图形处理单元被编程为执行加密处理，使得可以提供快速有效的加密处理解决方案而不产生额外的硬件成本。图形处理单元可以有效地执行加密处理，因为它具有被配置为处理大量并行进程的体系结构。通过将图形处理单元配置为能够进行浮点运算和整数运算，可以进一步提高在图形处理单元上执行的加密处理。

2. 发明授权

US07506134B1 Hardware resource based mapping of cooperative thread arrays (CTA) to result matrix tiles for efficient matrix multiplication in computing system comprising plurality of multiprocessors 有权
标题翻译：基于硬件资源的协作线程数组（CTA）的映射结果用于在包括多个多处理器的计算系统中有效的矩阵乘法的矩阵瓦片
公开(公告)号：US07506134B1
公开(公告)日：2009-03-17
申请号：US11454542
申请日：2006-06-16
申请人： Norbert Juffa , Radoslav Danilak
发明人： Norbert Juffa , Radoslav Danilak
IPC分类号： G06F9/46
CPC分类号： G06F9/5066 , G06F9/5038 , G06F2209/5017
摘要： The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts.
摘要翻译：本发明使得能够对并行处理装置进行有效的矩阵乘法运算。一个实施例是用于将CTA映射到用于矩阵乘法运算的矩阵瓦片的方法。另一个实施例是用于将CTA映射到结果瓦片的第二种方法。其他实施例是用于将CTA的各个线程映射到块的元素以用于结果瓦片计算，源瓦片复制操作以及源瓦片复制和转置操作的方法。本发明有利地使结果矩阵元素可以使用在不同的流式多处理器上同时执行的多个CTA来逐个瓦片地计算，使得能够将源瓦片复制到本地存储器，以减少当计算一个结果图块，并且启用来自全局存储器的合并的读取操作以及对本地存储器的写入操作，而没有存储体冲突。

3. 发明授权

US06381625B2 Method and apparatus for calculating a power of an operand 有权
标题翻译：用于计算操作数的功率的方法和装置
公开(公告)号：US06381625B2
公开(公告)日：2002-04-30
申请号：US09782474
申请日：2001-02-12
申请人： Stuart Oberman , Norbert Juffa , Ming Siu , Frederick D Weber , Ravikrishna Cherukuri
发明人： Stuart Oberman , Norbert Juffa , Ming Siu , Frederick D Weber , Ravikrishna Cherukuri
IPC分类号： G06F7552
CPC分类号： G06F7/53 , G06F7/4991 , G06F7/49936 , G06F7/49963 , G06F7/49994 , G06F7/5338 , G06F7/5443 , G06F9/30036 , G06F9/3017 , G06F9/3804 , G06F9/3885 , G06F17/16 , G06F2207/3828
摘要： A multiplier capable of performing signed and unsigned scalar and vector multiplication is disclosed. The multiplier is configured to receive signed or unsigned multiplier and multiplicand operands in scalar or packed vector form. An effective sign for the multiplier and multiplicand operands may be calculated and used to create and select a number of partial products according to Booth's algorithm. Once the partial products have been created and selected, they may be summed and the results may be output. The results may be signed or unsigned, and may represent vector or scalar quantities. When a vector multiplication is performed, the multiplier may be configured to generate and select partial products so as to effectively isolate the multiplication process for each pair of vector components. The multiplier may also be configured to sum the products of the vector components to form the vector dot product. The final product may be output in segments so as to require fewer bus lines. The segments may be rounded by adding a rounding constant. Rounding and normalization may be performed in two paths, one assuming an overflow will occur, the other assuming no overflow will occur. The multiplier may also be configured to perform iterative calculations to evaluate constant powers of an operand. Intermediate products that are formed may be rounded and normalized in two paths and then compressed and stored for use in the next iteration. An adjustment constant may also be added to increase the frequency of exactly rounded results.
摘要翻译：公开了能够执行有符号和无符号标量和矢量乘法的乘法器。乘法器配置为以标量或压缩向量形式接收带符号或无符号乘数和被乘数操作数。可以计算乘数和被乘数操作数的有效符号，并用于根据布斯算法创建和选择多个部分乘积。一旦创建并选择了部分产品，就可以对它们进行求和并输出结果。结果可能是有符号或无符号的，可能表示向量或标量。当执行向量乘法时，乘法器可以被配置为产生和选择部分乘积，以便有效地隔离每对向量分量的乘法过程。乘法器还可以被配置为对矢量分量的乘积求和以形成向量点积。最终产品可以分段输出，以便需要更少的总线。可以通过添加舍入常数来对段进行舍入。可以在两个路径中执行舍入和归一化，一个假设将发生溢出，另一个假设不会发生溢出。乘法器还可以被配置为执行迭代计算以评估操作数的恒定功率。形成的中间产品可以在两个路径中进行圆化和归一化，然后压缩并存储以用于下一次迭代。还可以添加调整常数以增加精确舍入结果的频率。

4. 发明授权

US06370637B1 Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria 有权
标题翻译：根据标准优化多管道可执行和特定管道可执行指令的分配到执行管道
公开(公告)号：US06370637B1
公开(公告)日：2002-04-09
申请号：US09370789
申请日：1999-08-05
申请人： Stephan G. Meier , Norbert Juffa , Frederick D. Weber , Stuart F. Oberman
发明人： Stephan G. Meier , Norbert Juffa , Frederick D. Weber , Stuart F. Oberman
IPC分类号： G06F938
CPC分类号： G06F9/30032 , G06F9/30192 , G06F9/3836 , G06F9/384 , G06F9/3855 , G06F9/3857
摘要： A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.
摘要翻译：公开了一种具有配置成有效地分配多流水线可执行指令的浮点单元的微处理器。多管道可执行指令是不强制在特定类型执行管道中执行的指令。例如，垃圾操作是多管道可执行的。垃圾操作是在浮点单元的流水线的早期执行的指令（例如，在寄存器重命名期间），但是仍然通过用于异常检查的执行管线。垃圾操作不限于特定的执行管道，而是可以通过浮点单元中的任何一个微处理器的执行流水线。使用许多不同的标准，在每个时钟周期的基础上分配多流水线可执行指令。例如，分配可以根据浮点单元在单个时钟周期中接收的多流水线可执行指令的数量而变化。

5. 发明授权

US06247117B1 Apparatus and method for using checking instructions in a floating-point execution unit 有权
标题翻译：在浮点执行单元中使用检查指令的装置和方法
公开(公告)号：US06247117B1
公开(公告)日：2001-06-12
申请号：US09265230
申请日：1999-03-08
申请人： Norbert Juffa
发明人： Norbert Juffa
IPC分类号： G06F1500
CPC分类号： G06F9/226 , G06F9/30014 , G06F9/30192
摘要： The use of checking instructions to detect special and exceptional cases of a defined data format in a microprocessor is disclosed. Generally speaking, a checking instruction is included with the microcode of floating-point instructions to detect special and exceptional cases of operand values for the floating-point instructions. A checking instruction is configured to set one or more flags in a flags register if it detects a special or exceptional case for an operand value. A checking instruction may also set the result or results of a floating-point instruction to a result value if a special or exceptional case is detected. In addition, a checking instruction may be configured to set one or more bits in status register if a special or exceptional case is detected. After a checking instruction completes execution, a subsequent microcode instruction can be executed to determine if one or more flags were set by the checking instruction. If one or more flags have been set by the checking instruction, the subsequent microcode instruction can branch to a non-sequential microcode instruction to handle the special or exceptional case detected by the checking instruction.
摘要翻译：公开了使用检查指令来检测微处理器中定义的数据格式的特殊和异常情况。一般来说，浮点指令的微码中包含检查指令，以检测浮点指令的操作数值的特殊情况和异常情况。检查指令被配置为在标志寄存器中设置一个或多个标志，如果它检测到操作数值的特殊或异常情况。如果检测到特殊或特殊情况，则检查指令还可以将浮点指令的结果或结果设置为结果值。此外，如果检测到特殊或特殊情况，则检查指令可以被配置为在状态寄存器中设置一个或多个位。在检查指令完成执行之后，可以执行随后的微代码指令以确定检查指令是否设置了一个或多个标志。如果通过检查指令设置了一个或多个标志，则后续的微代码指令可以转移到非顺序的微代码指令，以处理由检查指令检测到的特殊或特殊情况。

6. 发明授权

US6115732A Method and apparatus for compressing intermediate products 失效
标题翻译：用于压缩中间产品的方法和装置
公开(公告)号：US6115732A
公开(公告)日：2000-09-05
申请号：US75418
申请日：1998-05-08
申请人： Stuart F. Oberman , Norbert Juffa , Fred Weber
发明人： Stuart F. Oberman , Norbert Juffa , Fred Weber
IPC分类号： G06F7/52 , G06F7/533 , G06F7/544 , G06F9/318 , G06F9/38 , G06F17/16
CPC分类号： G06F7/53 , G06F17/16 , G06F7/5443 , G06F9/30036 , G06F9/3017 , G06F9/3804 , G06F9/3885 , G06F2207/3828 , G06F7/4991 , G06F7/49936 , G06F7/49963 , G06F7/49994 , G06F7/5338
摘要： A processor capable of efficiently performing iterative calculations is disclosed. The processor comprises a multiplier that is configured to perform iterative multiplication operations to evaluate constant powers of an operand such as the reciprocal and reciprocal square root. Intermediate products that are formed are compressed and decompressed to reduce interim storage requirements. The intermediate products may be rounded and normalized in two paths, one assuming an overflow will occur, and then compressed and stored for use in the next iteration.
摘要翻译：公开了能够有效执行迭代计算的处理器。处理器包括乘法器，其被配置为执行迭代乘法运算以评估诸如倒数和倒数平方根的操作数的恒定功率。形成的中间产品被压缩和减压以减少临时存储要求。中间产品可以在两个路径中进行舍入和归一化，一个假设将发生溢出，然后压缩并存储以用于下一次迭代。

7. 发明申请

US20100325187A1 EFFICIENT MATRIX MULTIPLICATION ON A PARALLEL PROCESSING DEVICE 有权
标题翻译：并行处理器件的高效矩阵乘法
公开(公告)号：US20100325187A1
公开(公告)日：2010-12-23
申请号：US12875961
申请日：2010-09-03
申请人： Norbert Juffa , Radoslav Danilak
发明人： Norbert Juffa , Radoslav Danilak
IPC分类号： G06F7/52
CPC分类号： G06F17/16
摘要： The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts.
摘要翻译：本发明使得能够对并行处理装置进行有效的矩阵乘法运算。一个实施例是用于将CTA映射到用于矩阵乘法运算的矩阵瓦片的方法。另一个实施例是用于将CTA映射到结果瓦片的第二种方法。其他实施例是用于将CTA的各个线程映射到块的元素以用于结果瓦片计算，源瓦片复制操作以及源瓦片复制和转置操作的方法。本发明有利地使结果矩阵元素可以使用在不同的流式多处理器上同时执行的多个CTA来逐个瓦片地计算，使得能够将源瓦片复制到本地存储器，以减少当计算一个结果图块，并且启用来自全局存储器的合并的读取操作以及对本地存储器的写入操作，而没有存储体冲突。

8. 发明授权

US06397239B2 Floating point addition pipeline including extreme value, comparison and accumulate functions 有权
标题翻译：浮点附加流水线包括极值，比较和累加功能
公开(公告)号：US06397239B2
公开(公告)日：2002-05-28
申请号：US09778352
申请日：2001-02-06
申请人： Stuart F. Oberman , Norbert Juffa , Fred Weber , Krishnan Ramani , Ravi Krishna Cherukuri
发明人： Stuart F. Oberman , Norbert Juffa , Fred Weber , Krishnan Ramani , Ravi Krishna Cherukuri
IPC分类号： G06F742
CPC分类号： G06F7/483 , G06F9/30014 , G06F9/30021 , G06F9/30036 , H03M7/24
摘要： A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.
摘要翻译：多媒体执行单元被配置为执行矢量的浮点和整数指令。执行单元可以包括具有远近数据路径的加法/减法流水线。远程路径被配置为处理具有大于1的绝对指数差的操作数的有效加法运算和有效减法运算。关闭路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。关闭路径被配置为生成两个输出值，其中一个输出值是第一输入操作数加上第二输入操作数的反转版本，而第二输出值等于第一输出值加1。在闭合路径中选择第一或第二输出值对加法器的输出实现了舍入到最近的运算。

9. 发明授权

US06397238B2 Method and apparatus for rounding in a multiplier 有权
标题翻译：在乘法器中舍入的方法和装置
公开(公告)号：US06397238B2
公开(公告)日：2002-05-28
申请号：US09782475
申请日：2001-02-12
申请人： Stuart Oberman , Norbert Juffa , Ming Siu , Frederick D Weber , Ravikrishna Cherukuri
发明人： Stuart Oberman , Norbert Juffa , Ming Siu , Frederick D Weber , Ravikrishna Cherukuri
IPC分类号： G06F752
CPC分类号： G06F7/53 , G06F7/4991 , G06F7/49936 , G06F7/49963 , G06F7/49994 , G06F7/5338 , G06F7/5443 , G06F9/30036 , G06F9/3017 , G06F9/3804 , G06F9/3885 , G06F17/16 , G06F2207/3828
摘要： A multiplier capable of performing signed and unsigned scalar and vector multiplication is disclosed. The multiplier is configured to receive signed or unsigned multiplier and multiplicand operands in scalar or packed vector form. An effective sign for the multiplier and multiplicand operands may be calculated and used to create and select a number of partial products according to Booth's algorithm. Once the partial products have been created and selected, they may be summed and the results may be output. The results may be signed or unsigned, and may represent vector or scalar quantities. When a vector multiplication is performed, the multiplier may be configured to generate and select partial products so as to effectively isolate the multiplication process for each pair of vector components. The multiplier may also be configured to sum the products of the vector components to form the vector dot product. The final product may be output in segments so as to require fewer bus lines. The segments may be rounded by adding a rounding constant. Rounding and normalization may be performed in two paths, one assuming an overflow will occur, the other assuming no overflow will occur. The multiplier may also be configured to perform iterative calculations to evaluate constant powers of an operand. Intermediate products that are formed may be rounded and normalized in two paths and then compressed and stored for use in the next iteration. An adjustment constant may also be added to increase the frequency of exactly rounded results.
摘要翻译：公开了能够执行有符号和无符号标量和矢量乘法的乘法器。乘法器配置为以标量或压缩向量形式接收带符号或无符号乘数和被乘数操作数。可以计算乘数和被乘数操作数的有效符号，并用于根据布斯算法创建和选择多个部分乘积。一旦创建并选择了部分产品，就可以对它们进行求和并输出结果。结果可能是有符号或无符号的，可能表示向量或标量。当执行向量乘法时，乘法器可以被配置为产生和选择部分乘积，以便有效地隔离每对向量分量的乘法过程。乘法器还可以被配置为对矢量分量的乘积求和以形成向量点积。最终产品可以分段输出，以便需要更少的总线。可以通过添加舍入常数来对段进行舍入。可以在两个路径中执行舍入和归一化，一个假设将发生溢出，另一个假设不会发生溢出。乘法器还可以被配置为执行迭代计算以评估操作数的恒定功率。形成的中间产品可以在两个路径中进行圆化和归一化，然后压缩并存储以用于下一次迭代。还可以添加调整常数以增加精确舍入结果的频率。

10. 发明授权

US06393555B1 Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit 有权
标题翻译：通过将比较结果存储在浮点单元中的临时寄存器中，FCOMI后快速执行FCMOV
公开(公告)号：US06393555B1
公开(公告)日：2002-05-21
申请号：US09370787
申请日：1999-08-05
申请人： Stephan G. Meier , Norbert Juffa , Frederick D. Weber , Stuart F. Oberman
发明人： Stephan G. Meier , Norbert Juffa , Frederick D. Weber , Stuart F. Oberman
IPC分类号： G06F930
CPC分类号： G06F9/30021 , G06F9/30094 , G06F9/30101 , G06F9/3842 , G06F9/3885
摘要： A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.
摘要翻译：具有浮点单元的微处理器被配置为快速执行浮点比较（FCOMI）类型指令，其后面是浮点条件移动（FCMOV）类型指令。通常将其结果存储到整数状态标志寄存器的FCOMI型指令进行修改，以将其结果的副本存储到位于浮点单元内的临时寄存器。如果在FCOMI型指令之后检测到FCMOV型指令，则FCMOV型指令的标志信息源从整数标志寄存器改变为临时寄存器。因此，FCMOV型指令能够早期执行，因为它们不需要等待从微处理器的整数部分读取整数标志。还公开了一种用于快速执行FCOMI型指令的计算机系统和方法，随后是FCMOV型指令。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式