会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 61. 发明申请
    • Processor which Implements Fused and Unfused Multiply-Add Instructions in a Pipelined Manner
    • 处理器,以流水线方式实现融合和未填充的乘法添加说明
    • US20090248779A1
    • 2009-10-01
    • US12057894
    • 2008-03-28
    • Jeffrey S. BrooksChristopher H. Olson
    • Jeffrey S. BrooksChristopher H. Olson
    • G06F7/44
    • G06F7/483G06F7/5443G06F2207/3884
    • Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    • 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。
    • 62. 发明授权
    • Apparatus and method to support pipelining of differing-latency instructions in a multithreaded processor
    • 在多线程处理器中支持不同延迟指令流水线的装置和方法
    • US07478225B1
    • 2009-01-13
    • US10881071
    • 2004-06-30
    • Jeffrey S. BrooksChristopher H. OlsonRobert T. Golla
    • Jeffrey S. BrooksChristopher H. OlsonRobert T. Golla
    • G06F9/30
    • G06F9/3836G06F9/3851G06F9/3857G06F9/3873
    • An apparatus and method to support pipelining of variable-latency instructions in a multithreaded processor. In one embodiment, a processor may include instruction fetch logic configured to issue a first and a second instruction from different ones of a plurality of threads during successive cycles. The processor may also include first and second execution units respectively configured to execute shorter-latency and longer-latency instructions and to respectively write shorter-latency or longer-latency instruction results to a result write port during a first or second writeback stage. The first writeback stage may occur a fewer number of cycles after instruction issue than the second writeback stage. The instruction fetch logic may be further configured to guarantee result write port access by the second execution unit during the second writeback stage by preventing the shorter-latency instruction from issuing during a cycle for which the first writeback stage collides with the second writeback stage.
    • 支持多线程处理器中可变延迟指令流水线的装置和方法。 在一个实施例中,处理器可以包括指令提取逻辑,其被配置为在连续循环期间从多个线程中的不同线程发出第一和第二指令。 处理器还可以包括第一和第二执行单元,其分别被配置为执行较短延迟和较长延迟的指令,并且在第一或第二回写阶段期间分别将较短等待时间或更长延迟的指令结果写入结果写入端口。 指令发布后的第一个回写阶段可能发生的次数比第二个回写阶段少。 指令提取逻辑可以被进一步配置为通过在第一写回阶段与第二回写阶段相冲突的周期期间防止短暂延迟指令发出来保证第二执行单元在第二写回阶段期间的结果写入端口访问。
    • 63. 发明授权
    • Execution unit for performing the data encryption standard
    • 用于执行数据加密标准的执行单元
    • US07443981B1
    • 2008-10-28
    • US10676554
    • 2003-10-01
    • Leonard D. RarickChristopher H. Olson
    • Leonard D. RarickChristopher H. Olson
    • H04K1/00
    • H04L9/0625H04L2209/12
    • An execution unit adapted to perform at least a portion of the Data Encryption Standard. The execution unit includes a Left Half input; a Key input; and a Table input. The execution unit also includes a first group of transistors configured to receive the Table input, perform a table look-up, and output data. The execution unit further includes a first exclusive-or operator having two inputs and an output. The first exclusive-or operator is configured to receive the Left Half input and the Key input. The execution unit also includes a second exclusive-or operator having two inputs and an output. The second exclusive-or operator is configured to receive the data output by the first group of transistors and to receive the output of the first exclusive-or operator. The execution unit also includes a third exclusive-or operator having two inputs and an output. The third exclusive-or operator is configured to receive the Left Half input and the data output by the first group of transistors.
    • 适于执行数据加密标准的至少一部分的执行单元。 执行单元包括左半输入; 一键输入 和一个表输入。 执行单元还包括被配置为接收表输入,执行表查找和输出数据的第一组晶体管。 执行单元还包括具有两个输入和一个输出的第一个异或运算符。 第一个独占或运算符被配置为接收左半输入和键输入。 执行单元还包括具有两个输入和一个输出的第二个异或运算符。 第二异或运算符被配置为接收由第一组晶体管输出的数据并且接收第一异或运算符的输出。 执行单元还包括具有两个输入和一个输出的第三个异或运算符。 第三个异或运算符被配置为接收第一组晶体管的左半输入和数据输出。
    • 64. 发明授权
    • Efficient floating point normalization mechanism
    • 高效浮点归一化机制
    • US5957997A
    • 1999-09-28
    • US840926
    • 1997-04-25
    • Christopher H. OlsonMartin S. Schmookler
    • Christopher H. OlsonMartin S. Schmookler
    • G06F5/01
    • G06F5/012
    • A floating point result in a processor is efficiently normalized by predicting the mantissa shift required to normalize the result to an error of one bit position in one direction, resulting in minimum and maximum predicted shifts. Concurrently with an addition of operands to generate a result mantissa, an inversion of the minimum predicted shift is added to the operand exponent to generate an intermediate exponent corresponding to a maximum predicted shift. When the operand addition is complete, the result mantissa is partially shifted in response to the minimum predicted shift. The location of the leading one is then ascertained and compared to the remaining minimum predicted shift. If the minimum predicted shift is the actual shift required to normalize the result, the result mantissa is further shifted by the remaining minimum predicted shift and an exponent carry-in is asserted. On the other hand, if the maximum predicted shift is the actual shift required, the result mantissa is further shifted by the remaining minimum shift and by an additional bit position and the exponent carry-in is not asserted.
    • 通过预测将结果归一化为在一个方向上的一位位置的误差所需的尾数偏移,导致处理器中的浮点得到有效的归一化,导致最小和最大的预测偏移。 同时加上用于产生结果尾数的操作数,将最小预测偏移的反转加到操作数指数上,以产生对应于最大预测位移的中间指数。 当操作数加法完成时,结果尾数会根据最小预测位移而部分移位。 然后确定领先的位置,并将其与剩余的最小预测位移进行比较。 如果最小预测偏移是归一化结果所需的实际偏移量,则结果尾数进一步移位剩余的最小预测偏移量,并且指定指数进位。 另一方面,如果最大预测偏移是所需的实际偏移,则结果尾数进一步移位剩余的最小偏移量和附加位位置,并且不指示指数进位。
    • 65. 发明授权
    • Floating-point processor having post-writeback spill stage
    • 浮点处理器具有回写后溢出阶段
    • US5583805A
    • 1996-12-10
    • US352661
    • 1994-12-09
    • Timothy A. ElliottRobert T. GollaChristopher H. OlsonTerence M. Potter
    • Timothy A. ElliottRobert T. GollaChristopher H. OlsonTerence M. Potter
    • G06F7/57G06F7/38
    • G06F7/483G06F7/49915
    • An apparatus for handling special cases outside of normal floating-point arithmetic functions is provided that is used in a floating-point unit used for calculating arithmetic functions. The floating-point unit generates an exponent portion and a mantissa portion and a writeback stage is coupled to the exponent portion and to the mantissa portion and is specifically used to handle the special cases outside the normal float arithmetic functions. A spill stage is also provided and is coupled to the writeback stage to receive a resultant exponent and mantissa. A register file unit is coupled to the writeback stage and the spill stage through a plurality of rename busses, which are used to carry results between the writeback stage and spill stage and the register file. The spill stage is serially coupled to the writeback stage so as to provide a smooth operation in the transition of operating on the results from the writeback stage for the exponent and mantissa. Each rename bus has a pair of tri-state buffers, one used to couple the rename bus to the writeback stage and the other used to couple the rename bus to the spill stage. The instruction dispatcher also provides location information for directing the results from the writeback stage and the spill stage before the result is completed.
    • 提供了用于处理正常浮点运算功能之外的特殊情况的装置,用于计算算术功能的浮点单元。 浮点单元产生指数部分和尾数部分,并且回写阶段耦合到指数部分和尾数部分,并且专门用于处理普通浮点运算功能之外的特殊情况。 还提供溢出阶段并且耦合到回写阶段以接收所得到的指数和尾数。 寄存器文件单元通过多个重命名总线耦合到回写阶段和溢出阶段,这些总线用于在回写阶段和溢出阶段之间携带结果和寄存器文件。 溢出级串联耦合到回写阶段,以便在针对指数和尾数的回写阶段的结果的转换中提供平滑的操作。 每个重命名总线都有一对三态缓冲器,一个用于将重命名总线耦合到回写阶段,另一个用于将重命名总线耦合到溢出级。 指令调度器还提供位置信息,用于在结果完成之前从写回阶段和溢出阶段引导结果。
    • 66. 发明授权
    • Method and system for high speed floating point exception enabled
operation in a multiscalar processor system
    • 用于多速度处理器系统中高速浮点异常使能操作的方法和系统
    • US5410657A
    • 1995-04-25
    • US959193
    • 1992-10-09
    • Christopher H. OlsonTerence M. Potter
    • Christopher H. OlsonTerence M. Potter
    • G06F7/00G06F9/38G06F9/28G06F9/30G06F15/16G06F15/347
    • G06F9/3836G06F9/3857G06F9/3861
    • A method and system are disclosed for implementing floating point exception enabled operation without substantial performance degradation. In a multiscalar processor system, multiple instructions may be issued and executed simultaneously utilizing multiple independent functional units. This is typically accomplished utilizing separate branch, fixed point and floating point processor units. Floating point arithmetic instructions within the floating point processor unit may initiate one of a variety of exceptions associated within invalid operations and as a result of the pipelined nature of floating point processor units an identification of which instruction initiated the exception is not possible. In the described method and system, an associated dummy instruction having a retained instruction address is dispatched to the fixed point processor unit each time a floating point arithmetic instruction is dispatched to the floating point processor unit. Thereafter, the output of each instruction from the floating point processor unit is synchronized with an output of an associated dummy instruction wherein each instruction within the floating point processor unit which initiates a floating point exception may be accurately identified utilizing the retained instruction address of the associated dummy instruction.
    • 公开了一种用于实现浮点异常启用操作而不会显着降低性能的方法和系统。 在多级数据处理器系统中,可以使用多个独立功能单元同时发出并执行多个指令。 这通常使用单独的分支,固定点和浮点处理器单元来完成。 浮点处理器单元内的浮点运算指令可以启动与无效操作相关联的各种异常之一,并且由于浮点处理器单元的流水线性质的结果,引发异常的指令是不可能的。 在所描述的方法和系统中,每当向浮点处理器单元调度浮点算术指令时,将具有保留指令地址的相关联的伪指令分派到定点处理器单元。 此后,来自浮点处理器单元的每个指令的输出与相关联的虚拟指令的输出同步,其中可以使用所关联的虚拟指令的保留指令地址来准确地识别启动浮点异常的浮点处理器单元内的每个指令 虚拟指令。
    • 68. 发明授权
    • Processor pipeline which implements fused and unfused multiply-add instructions
    • 处理器管道,其实现融合和未加密的乘法加法指令
    • US08977670B2
    • 2015-03-10
    • US13469212
    • 2012-05-11
    • Jeffrey S. BrooksChristopher H. Olson
    • Jeffrey S. BrooksChristopher H. Olson
    • G06F7/38G06F7/483G06F7/544
    • G06F7/483G06F7/5443G06F2207/3884
    • Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    • 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。
    • 69. 发明申请
    • DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES
    • 具有多个引擎的部门
    • US20130179664A1
    • 2013-07-11
    • US13345391
    • 2012-01-06
    • Christopher H. OlsonJeffrey S. BrooksMatthew B. Smittle
    • Christopher H. OlsonJeffrey S. BrooksMatthew B. Smittle
    • G06F7/487G06F9/38G06F7/537G06F9/302G06F5/01G06F9/30
    • G06F9/3895G06F7/49936G06F7/535G06F7/5375G06F9/3001G06F9/3875G06F9/3885
    • Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit.
    • 公开了涉及包括用于划分和/或平方根操作的硬件支持的集成电路的技术。 在一个实施例中,公开了一种集成电路,其包括分割单元,该分割单元又包括归一化电路和多个除法引擎。 归一化电路被配置为归一化一组操作数。 每个分频引擎被配置为对从归一化电路接收的相应的归一化操作数集进行操作。 在一些实施例中,集成电路包括调度器单元,其被配置为选择用于向包括该分割单元的多个执行单元发布的指令。 调度器单元还被配置为保持指示当前正在由分割单元操作的指令的数量的计数器,并且基于计数器确定是否计划用于发布到分割单元的后续指令。
    • 70. 发明申请
    • STORING A TARGET ADDRESS OF A CONTROL TRANSFER INSTRUCTION IN AN INSTRUCTION FIELD
    • 在指挥领域存储控制传输指令的目标地址
    • US20130138888A1
    • 2013-05-30
    • US13307850
    • 2011-11-30
    • Jama I. BarrehManish K. ShahChristopher H. Olson
    • Jama I. BarrehManish K. ShahChristopher H. Olson
    • G06F12/08
    • G06F9/324G06F9/382G06F12/0862Y02D10/13
    • A control transfer instruction (CTI), such as a branch, jump, etc., may have an offset value for a control transfer that is to be performed. The offset value may be usable to compute a target address for the CTI (e.g., the address of a next instruction to be executed for a thread or instruction stream). The offset may be specified relative to a program counter. In response to detecting a specified offset value, the CTI may be modified to include at least a portion of a computed target address. Information indicating this modification has been performed may be stored, for example, in a pre-decode bit. In some cases, CTI modification may be performed only when a target address is a “near” target, rather than a “far” target. Modifying CTIs as described herein may eliminate redundant address calculations and produce a savings of power and/or time in some embodiments.
    • 诸如分支,跳转等的控制传送指令(CTI)可以具有要执行的控制传输的偏移值。 偏移值可用于计算CTI的目标地址(例如,针对线程或指令流执行的下一条指令的地址)。 可以相对于程序计数器指定偏移量。 响应于检测到指定的偏移值,可以修改CTI以包括计算的目标地址的至少一部分。 已经执行了表示该修改的信息可以被存储在例如预解码位中。 在某些情况下,仅当目标地址是“近”目标而不是“远”目标时才可以执行CTI修改。 如本文所述的修改CTI可以在一些实施例中消除冗余地址计算并产生功率和/或时间的节省。