会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Processor and method for managing execution of an instruction which
determine subsequent to dispatch if an instruction is subject to
serialization
    • 用于管理指令的执行的处理器和方法,所述指令确定在调度指令是否进行序列化之后
    • US5678016A
    • 1997-10-14
    • US512741
    • 1995-08-08
    • Lee E. EisenRobert T. GollaChristopher H. OlsonMichael Putrino
    • Lee E. EisenRobert T. GollaChristopher H. OlsonMichael Putrino
    • G06F9/312G06F9/38
    • G06F9/30043G06F9/3836G06F9/384
    • A method and apparatus are disclosed for managing the execution of a floating-point store instruction within a data processing system including a memory and a superscalar processor having a number of floating-point registers (FPRs). According to the present invention, multiple instructions are dispatched for execution by the processor, including a floating-point store instruction having as an operand the content of a particular FPR. A determination is made whether the particular FPR is a destination register for results of a second instruction which precedes the store instruction in program order. If so, a determination is made whether the second instruction must complete before subsequent instructions can be successfully dispatched. In response to a determination that the second instruction must be completed prior to successfully dispatching subsequent instructions, the floating-point instruction is cancelled and redispatched after the completion of the second instruction. In response to a determination that the second instruction need not be completed prior to successfully dispatching subsequent instructions, execution of the floating-point store instruction is initiated by computing the destination address within memory into which the operand of the floating-point store instruction is to be stored, thereby minimizing the delay in executing a floating-point store instruction.
    • 公开了一种用于管理包括具有多个浮点寄存器(FPR)的存储器和超标量处理器的数据处理系统内的浮点存储指令的执行的方法和装置。 根据本发明,调度多个指令以供处理器执行,包括具有作为特定FPR的内容的操作数的浮点存储指令。 确定特定FPR是否是用于以程序顺序在存储指令之前的第二指令的结果的目的地寄存器。 如果是,则确定第二条指令是否必须在后续指令可以成功发送之前完成。 响应于在成功发送后续指令之前必须完成第二条指令的确定,在完成第二条指令之后,浮点指令被取消并重新分配。 响应于在成功发送后续指令之前不需要完成第二指令的确定,通过计算浮点存储指令的操作数所在的存储器内的目标地址来启动浮点存储指令的执行 被存储,从而最小化执行浮点存储指令的延迟。
    • 2. 发明授权
    • Method and apparatus for executing fixed-point instructions within idle
execution units of a superscalar processor
    • 用于在超标量处理器的空闲执行单元内执行定点指令的方法和装置
    • US5809323A
    • 1998-09-15
    • US530552
    • 1995-09-19
    • Lee E. EisenRobert T. GollaSoummya MallickSung-Ho ParkRajesh B. PatelMichael Putrino
    • Lee E. EisenRobert T. GollaSoummya MallickSung-Ho ParkRajesh B. PatelMichael Putrino
    • G06F9/302G06F9/38
    • G06F9/3001G06F9/3836G06F9/384
    • A superscalar processor and method for executing fixed-point instructions within a superscalar processor are disclosed. The superscalar processor has a memory and multiple execution units, including a fixed point execution unit (FXU) and a non-fixed point execution unit (non-FXU). According to the present invention, a set of instructions to be executed are fetched from among a number of instructions stored within memory. A determination is then made if n instructions, the maximum number possible, can be dispatched to the multiple execution units during a first processor cycle if fixed point arithmetic and logical instructions are dispatched only to the FXU. If so, n instructions are dispatched to the multiple execution units for execution. In response to a determination that n instructions cannot be dispatched during the first processor cycle, a determination is made whether a fixed point instruction is available to be dispatched and whether dispatching the fixed point instruction to the non-FXU for execution will result in greater efficiency. In response to a determination that a fixed point instruction is not available to be dispatched or that dispatching the fixed point instruction to the non-FXU will not result in greater efficiency, dispatch of the fixed point instruction is delayed until a second processor cycle. However, in response to a determination that dispatching the fixed point instruction to the non-FXU will result in greater efficiency, the fixed point instruction is dispatched to the non-FXU and executed, thereby improving execution unit utilization.
    • 公开了一种用于在超标量处理器内执行定点指令的超标量处理器和方法。 超标量处理器具有存储器和多个执行单元,包括固定点执行单元(FXU)和非固定点执行单元(非FXU)。 根据本发明,从存储在存储器中的多个指令中取出要执行的一组指令。 然后如果将固定点算术和逻辑指令仅发送到FXU,则可以在第一处理器周期期间将n个指令(尽可能最大数)分派到多个执行单元进行确定。 如果是这样,n个指令被分派到多个执行单元执行。 响应于在第一处理器周期期间不能调度n个指令的确定,确定是否可以调度固定点指令,以及是否向非FXU分派定点指令以执行将导致更高的效率 。 响应于确定不能发送固定点指令或者将定点指令分派到非FXU不会导致更高的效率,所以定点指令的调度被延迟到第二处理器周期。 然而,响应于将定点指令发送到非FXU的确定将导致更高的效率,将定点指令分派到非FXU并执行,从而提高执行单元的利用率。
    • 3. 发明授权
    • Load-store unit and method of loading and storing single-precision
floating-point registers in a double-precision architecture
    • 在双精度架构中加载和存储单精度浮点寄存器的加载存储单元和方法
    • US5805475A
    • 1998-09-08
    • US816067
    • 1997-03-11
    • Michael PutrinoLee E. Eisen
    • Michael PutrinoLee E. Eisen
    • G06F7/57G06F9/30G06F9/302G06F9/312G06F7/00G06F7/38
    • G06F9/30043G06F7/483G06F9/30014G06F9/30025G06F2207/382G06F7/49905
    • A floating point numbers load-store unit includes a translator for converting between the single-precision and double-precision representations, and Special-Case logic for providing Special-Case signals when a store is being performed on zero, infinity, or NaN. A store-float-double instruction is executed by concatenating a suffix to the mantissa in the single-precision floating-point register and replacing the high-order bit of the exponent with a prefix selected as a function of the high-order bit, wherein the resulting mantissa and exponent form a double-precision floating-point number that is then stored to memory. A load-float-double instruction is executed by dropping the suffix from the mantissa of the double-precision floating-point number in memory, and replacing the prefix with the high-order bit, wherein the resulting mantissa and exponent form a single-precision floating-point number that is then loaded into the single-precision floating-point register.
    • 浮点数加载存储单元包括用于在单精度和双精度表示之间进行转换的转换器,以及当在零,无穷大或NaN上执行存储时提供特殊情况信号的特殊情况逻辑。 通过将后缀连接到单精度浮点寄存器中的尾数来执行store-float-double指令,并且以由高位位选择的前缀替换指数的高位,其中 所得到的尾数和指数形成双精度浮点数,然后将其存储到存储器中。 通过从存储器中的双精度浮点数的尾数丢弃后缀,并用高位替换前缀,执行load-float-double指令,其中所得到的尾数和指数形成单精度 浮点数然后加载到单精度浮点寄存器中。
    • 4. 发明授权
    • Processor having vector processing capability and method for executing a vector instruction in a processor
    • 具有向量处理能力的处理器和用于在处理器中执行向量指令的方法
    • US06324638B1
    • 2001-11-27
    • US09282268
    • 1999-03-31
    • Thomas ElmerMichael Putrino
    • Thomas ElmerMichael Putrino
    • G06F1517
    • G06F7/5324G06F7/5332G06F9/30014G06F9/30036G06F2207/382G06F2207/3828
    • A processor capable of executing vector instructions includes at least an instruction sequencing unit and a vector processing unit that receives vector instructions to be executed from the instruction sequencing unit. The vector processing unit includes a plurality of multiply structures, each containing only a single multiply array, that each correspond to at least one element of a vector input operand. Utilizing the single multiply array, each of the plurality of multiply structures is capable of performing a multiplication operation on one element of a vector input operand and is also capable of performing a multiplication operation on multiple elements of a vector input operand concurrently. In an embodiment in which the maximum length of an element of a vector input operand is N bits, each of the plurality of multiply arrays can handle both N by N bit integer multiplication and M by M bit integer multiplication, where N is a non-unitary integer multiple of M. At least one of the multiply structures also preferably includes an accumulating adder that receives as a first input a result produced by that multiply structure and receives as a second input a result produced by another multiply structure. From these inputs, the accumulating adder produces as an output an accumulated sum of the results in response to execution of the same instruction that caused the multiply structures to produce the intermediate results.
    • 能够执行向量指令的处理器至少包括指令排序单元和向量处理单元,其从指令排序单元接收要执行的向量指令。 矢量处理单元包括多个乘法结构,每个乘法结构仅包含单个乘法阵列,每个乘法阵列对应于向量输入操作数的至少一个元素。 利用单个乘法阵列,多个乘法结构中的每一个能够对向量输入操作数的一个元素执行乘法运算,并且还能够同时对矢量输入操作数的多个元素执行乘法运算。 在矢量输入操作数的元素的最大长度为N位的实施例中,多个乘法阵列中的每一个可以处理N乘N位整数乘法和M乘M位整数乘法,其中N是非乘法, 多重结构中的至少一个还优选地包括累积加法器,其接收由该乘法结构产生的结果作为第一输入,并且作为第二输入接收由另一乘法结构产生的结果。 从这些输入中,积累加法器响应于导致乘法结构产生中间结果的相同指令的执行而产生结果的累加和。
    • 5. 发明授权
    • Method and apparatus for dynamic allocation of registers for
intermediate floating-point results
    • 用于中间浮点数结果的寄存器的动态分配方法和装置
    • US5805916A
    • 1998-09-08
    • US758017
    • 1996-11-27
    • Soummya MallickMichael PutrinoRomesh Mangho Jessani
    • Soummya MallickMichael PutrinoRomesh Mangho Jessani
    • G06F9/302G06F9/38
    • G06F9/30014G06F9/30105G06F9/30112G06F9/3836G06F9/384G06F9/3855G06F9/3857G06F9/3875
    • The present invention relates to a multiple stage execution unit for executing instructions in a microprocessor having a plurality of rename registers for storing execution results, an instruction cache for storing instructions, each instruction being associated with a rename register, a sequencer unit for providing an instruction to the execution unit, and a data cache for providing data to the execution unit. In one version, the execution unit includes a first stage which generates an intermediate result from the data according to an instruction; a means for providing a first portion of the intermediate result to an intermediate register; a means for providing a second portion of the intermediate result to a rename register associated with the instruction; a means for passing the first portion from the intermediate register to a second stage of the execution unit; a means for passing the second portion from the rename register to the second stage of the execution unit; wherein the second stage of the execution unit operates on the first and second portions according to the instruction.
    • 本发明涉及一种多级执行单元,用于在微处理器中执行指令,该微处理器具有用于存储执行结果的多个重命名寄存器,用于存储指令的指令高速缓存,每个指令与重命名寄存器相关联,定序器单元用于提供指令 以及用于向执行单元提供数据的数据高速缓存。 在一个版本中,执行单元包括根据指令从数据生成中间结果的第一阶段; 用于将中间结果的第一部分提供给中间寄存器的装置; 用于将中间结果的第二部分提供给与指令相关联的重命名寄存器的装置; 用于将第一部分从中间寄存器传递到执行单元的第二级的装置; 用于将第二部分从重命名寄存器传递到执行单元的第二级的装置; 其中执行单元的第二级根据该指令在第一和第二部分上操作。
    • 7. 发明授权
    • Method for implementing a four-way least recently used (LRU) mechanism
in high-performance
    • 在高性能数据处理系统中实现四路最近最少使用(LRU)机制的方法
    • US5765191A
    • 1998-06-09
    • US641060
    • 1996-04-29
    • Albert John LoperSoummya MallickRajesh Bhikhubhai PatelMichael Putrino
    • Albert John LoperSoummya MallickRajesh Bhikhubhai PatelMichael Putrino
    • G06F12/08G06F12/12
    • G06F12/123
    • A method for implementing a four-way least recently used cache line replacement scheme in a four-way cache memory is disclosed. The cache memory includes multiple cache lines, and each cache line includes four congruence sets. In accordance with the present disclosure, a 5-bit Least Recently Used (LRU) field is associated with each of the cache lines within the cache memory. For a particular cache line, a set number of a least recently used set among the four congruence sets is stored in any two bits of the LRU field associated with that cache line. Next, a set number of the second least recently used set among the four congruence sets is stored in another two bits of the same LRU field associated with the same cache line. Finally, a last bit of the 5-bit LRU field is set to a specific state in response to a determination of which one of the remaining two sets is the second most recently used set.
    • 公开了一种用于在四路高速缓冲存储器中实现四路最少使用的高速缓存行替换方案的方法。 高速缓冲存储器包括多个高速缓存行,并且每个高速缓存行包括四个一致集合。 根据本公开,5位最近使用(LRU)字段与高速缓冲存储器内的每个高速缓存行相关联。 对于特定的高速缓存行,四个同余集中的最近最少使用的集合的集合数存储在与该高速缓存行相关联的LRU字段的任何两个位中。 接下来,将四个同余集合中的第二最近使用的集合的集合数存储在与相同高速缓存行相关联的相同LRU字段的另外两个比特中。 最后,响应于确定剩余两组中的哪一组是最近使用的第二组,将5位LRU字段的最后一位设置为特定状态。
    • 9. 发明授权
    • System for completing instruction out-of-order which performs target
address comparisons prior to dispatch
    • 用于完成在发送前执行目标地址比较的无序指令的系统
    • US6098168A
    • 2000-08-01
    • US46867
    • 1998-03-24
    • Lee Evan EisenMichael Putrino
    • Lee Evan EisenMichael Putrino
    • G06F9/38
    • G06F9/3842G06F9/3836G06F9/384G06F9/3855G06F9/3857
    • A mechanism structured to check for instruction collisions at the Dispatch Unit rather than the Completion Unit. In processors which issue multiple commands simultaneously, a flag bit is sent to the Completion Unit and attached to the instruction in the queue that follows the other in program order if they both have the same targeted address. When the instructions from position 1 and position 2 of the instruction queue are ready to issue, the Completion Unit checks position 2 for a flag bit. If there is a bit, then the instruction in position 1 is discarded and the instruction in position 2 is written to the target address. If there is no flag bit with the instruction in position 2, the instruction in position 1 is written to the target register. This method eliminates the need to compare all the targeted addresses that are associated with the rename registers. It requires two comparisons instead of a minimum of 15 comparisons.
    • 一种结构化的检查在调度单位而不是完成单位的指令冲突的机制。 在同时发出多个命令的处理器中,如果标志位都具有相同的目标地址,则将标志位发送到完成单元并附加到队列中的跟随另一命令的指令。 当指令队列的位置1和位置2的指令准备发出时,完成单元检查位置2是否有一个标志位。 如果有位,则丢弃位置1的指令,将位置2中的指令写入目标地址。 如果位置2中的指令没有标志位,则将位置1的指令写入目标寄存器。 该方法不需要比较与重命名寄存器相关的所有目标地址。 它需要两次比较,而不是至少15次比较。
    • 10. 发明授权
    • Method and system for fast determination of sticky and guard bits
    • 用于快速测定粘性和保护位的方法和系统
    • US5805487A
    • 1998-09-08
    • US677843
    • 1996-07-12
    • Timothy Alan ElliottChristopher Hans OlsonMichael Putrino
    • Timothy Alan ElliottChristopher Hans OlsonMichael Putrino
    • G06F7/38G06F7/00G06F7/483G06F7/57G06F7/76G06F7/48
    • G06F7/483G06F7/49952G06F7/49957
    • A method and system for fast calculation of the sticky bit and a function of the guard bit is disclosed. A first aspect of the method and system provides a fast calculation of the sticky bit. A second aspect provides a fast calculation of a function of the guard bit. Both aspects comprise means for providing an intermediate result of a floating point mathematical operation involving at least a first and a second operand and means for providing a mask indicating a position of a leading one in a mantissa of the intermediate result. In the first aspect, means for aligning a first bit of the mask to an (n+2)nd bit of the intermediate result, where n is the number of bits in a mantissa of the first or second operand, are coupled to the intermediate result providing means. In the second aspect, means for aligning a first bit of the mask to an (n+1)st bit of the intermediate result are coupled to the intermediate result providing means. In both aspects, means for providing an output are coupled to the aligning means and intermediate result providing means. The output of the first aspect comprises the sticky bit. The output of the second aspect comprises a function of the guard bit. Thus, the method and system allow the sticky bit and a function of the guard bit to be calculated substantially simultaneously with normalization. Because the method and system allow fast determination of the sticky bit and a function of the guard bit, the overall speed of the calculation is increased and system performance is improved.
    • 公开了一种用于快速计算粘滞位和保护位功能的方法和系统。 该方法和系统的第一方面提供了粘性位的快速计算。 第二方面提供了对保护位的功能的快速计算。 两个方面包括用于提供涉及至少第一和第二操作数的浮点数学运算的中间结果的装置,以及用于提供指示中间结果的尾数中的前导位置的掩码的装置。 在第一方面,用于将掩模的第一位与中间结果的第(n + 2)位对齐的装置,其中n是第一或第二操作数的尾数中的位数, 结果提供手段。 在第二方面,用于将掩模的第一位与中间结果的第(n + 1)位进行对准的装置耦合到中间结果提供装置。 在两个方面,用于提供输出的装置耦合到对准装置和中间结果提供装置。 第一方面的输出包括粘点。 第二方面的输出包括保护位的功能。 因此,该方法和系统允许基本上与归一化同时计算粘滞位和保护位的功能。 由于方法和系统允许快速确定粘滞位和保护位的功能,所以计算的总速度提高,系统性能得到提高。