会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method for implementing a four-way least recently used (LRU) mechanism
in high-performance
    • 在高性能数据处理系统中实现四路最近最少使用(LRU)机制的方法
    • US5765191A
    • 1998-06-09
    • US641060
    • 1996-04-29
    • Albert John LoperSoummya MallickRajesh Bhikhubhai PatelMichael Putrino
    • Albert John LoperSoummya MallickRajesh Bhikhubhai PatelMichael Putrino
    • G06F12/08G06F12/12
    • G06F12/123
    • A method for implementing a four-way least recently used cache line replacement scheme in a four-way cache memory is disclosed. The cache memory includes multiple cache lines, and each cache line includes four congruence sets. In accordance with the present disclosure, a 5-bit Least Recently Used (LRU) field is associated with each of the cache lines within the cache memory. For a particular cache line, a set number of a least recently used set among the four congruence sets is stored in any two bits of the LRU field associated with that cache line. Next, a set number of the second least recently used set among the four congruence sets is stored in another two bits of the same LRU field associated with the same cache line. Finally, a last bit of the 5-bit LRU field is set to a specific state in response to a determination of which one of the remaining two sets is the second most recently used set.
    • 公开了一种用于在四路高速缓冲存储器中实现四路最少使用的高速缓存行替换方案的方法。 高速缓冲存储器包括多个高速缓存行,并且每个高速缓存行包括四个一致集合。 根据本公开,5位最近使用(LRU)字段与高速缓冲存储器内的每个高速缓存行相关联。 对于特定的高速缓存行,四个同余集中的最近最少使用的集合的集合数存储在与该高速缓存行相关联的LRU字段的任何两个位中。 接下来,将四个同余集合中的第二最近使用的集合的集合数存储在与相同高速缓存行相关联的相同LRU字段的另外两个比特中。 最后,响应于确定剩余两组中的哪一组是最近使用的第二组,将5位LRU字段的最后一位设置为特定状态。
    • 3. 发明授权
    • Method and apparatus for dynamic allocation of registers for
intermediate floating-point results
    • 用于中间浮点数结果的寄存器的动态分配方法和装置
    • US5805916A
    • 1998-09-08
    • US758017
    • 1996-11-27
    • Soummya MallickMichael PutrinoRomesh Mangho Jessani
    • Soummya MallickMichael PutrinoRomesh Mangho Jessani
    • G06F9/302G06F9/38
    • G06F9/30014G06F9/30105G06F9/30112G06F9/3836G06F9/384G06F9/3855G06F9/3857G06F9/3875
    • The present invention relates to a multiple stage execution unit for executing instructions in a microprocessor having a plurality of rename registers for storing execution results, an instruction cache for storing instructions, each instruction being associated with a rename register, a sequencer unit for providing an instruction to the execution unit, and a data cache for providing data to the execution unit. In one version, the execution unit includes a first stage which generates an intermediate result from the data according to an instruction; a means for providing a first portion of the intermediate result to an intermediate register; a means for providing a second portion of the intermediate result to a rename register associated with the instruction; a means for passing the first portion from the intermediate register to a second stage of the execution unit; a means for passing the second portion from the rename register to the second stage of the execution unit; wherein the second stage of the execution unit operates on the first and second portions according to the instruction.
    • 本发明涉及一种多级执行单元,用于在微处理器中执行指令,该微处理器具有用于存储执行结果的多个重命名寄存器,用于存储指令的指令高速缓存,每个指令与重命名寄存器相关联,定序器单元用于提供指令 以及用于向执行单元提供数据的数据高速缓存。 在一个版本中,执行单元包括根据指令从数据生成中间结果的第一阶段; 用于将中间结果的第一部分提供给中间寄存器的装置; 用于将中间结果的第二部分提供给与指令相关联的重命名寄存器的装置; 用于将第一部分从中间寄存器传递到执行单元的第二级的装置; 用于将第二部分从重命名寄存器传递到执行单元的第二级的装置; 其中执行单元的第二级根据该指令在第一和第二部分上操作。
    • 5. 发明授权
    • Method and apparatus for executing fixed-point instructions within idle
execution units of a superscalar processor
    • 用于在超标量处理器的空闲执行单元内执行定点指令的方法和装置
    • US5809323A
    • 1998-09-15
    • US530552
    • 1995-09-19
    • Lee E. EisenRobert T. GollaSoummya MallickSung-Ho ParkRajesh B. PatelMichael Putrino
    • Lee E. EisenRobert T. GollaSoummya MallickSung-Ho ParkRajesh B. PatelMichael Putrino
    • G06F9/302G06F9/38
    • G06F9/3001G06F9/3836G06F9/384
    • A superscalar processor and method for executing fixed-point instructions within a superscalar processor are disclosed. The superscalar processor has a memory and multiple execution units, including a fixed point execution unit (FXU) and a non-fixed point execution unit (non-FXU). According to the present invention, a set of instructions to be executed are fetched from among a number of instructions stored within memory. A determination is then made if n instructions, the maximum number possible, can be dispatched to the multiple execution units during a first processor cycle if fixed point arithmetic and logical instructions are dispatched only to the FXU. If so, n instructions are dispatched to the multiple execution units for execution. In response to a determination that n instructions cannot be dispatched during the first processor cycle, a determination is made whether a fixed point instruction is available to be dispatched and whether dispatching the fixed point instruction to the non-FXU for execution will result in greater efficiency. In response to a determination that a fixed point instruction is not available to be dispatched or that dispatching the fixed point instruction to the non-FXU will not result in greater efficiency, dispatch of the fixed point instruction is delayed until a second processor cycle. However, in response to a determination that dispatching the fixed point instruction to the non-FXU will result in greater efficiency, the fixed point instruction is dispatched to the non-FXU and executed, thereby improving execution unit utilization.
    • 公开了一种用于在超标量处理器内执行定点指令的超标量处理器和方法。 超标量处理器具有存储器和多个执行单元,包括固定点执行单元(FXU)和非固定点执行单元(非FXU)。 根据本发明,从存储在存储器中的多个指令中取出要执行的一组指令。 然后如果将固定点算术和逻辑指令仅发送到FXU,则可以在第一处理器周期期间将n个指令(尽可能最大数)分派到多个执行单元进行确定。 如果是这样,n个指令被分派到多个执行单元执行。 响应于在第一处理器周期期间不能调度n个指令的确定,确定是否可以调度固定点指令,以及是否向非FXU分派定点指令以执行将导致更高的效率 。 响应于确定不能发送固定点指令或者将定点指令分派到非FXU不会导致更高的效率,所以定点指令的调度被延迟到第二处理器周期。 然而,响应于将定点指令发送到非FXU的确定将导致更高的效率,将定点指令分派到非FXU并执行,从而提高执行单元的利用率。
    • 6. 发明授权
    • Method for executing speculative load instructions in high-performance
processors
    • 在高性能处理器中执行推测加载指令的方法
    • US5611063A
    • 1997-03-11
    • US597647
    • 1996-02-06
    • Albert J. LoperSoummya MallickMichael Putrino
    • Albert J. LoperSoummya MallickMichael Putrino
    • G06F9/312G06F9/38G06F9/30
    • G06F9/30043G06F9/383G06F9/3842
    • A method for selectively executing speculative load instructions in a high-performance processor is disclosed. In accordance with the present disclosure, when a speculative load instruction for which the data is not stored in a data cache is encountered, a bit within an enable speculative load table which is associated with that particular speculative load instruction is read in order to determine a state of the bit. If the associated bit is in a first state, data for the speculative load instruction is requested from a system bus and further execution of the speculative load instruction is then suspended to wait for control signals from a branch processing unit. If the associated bit is in a second state, the execution of the speculative load instruction is immediately suspended to wait for control signals from the branch processing unit. If the speculative load instruction is executed in response to the control signals, then the associated bit in the enable speculative load table will be set to the first state. However, if the speculative load instruction is not executed in response to the control signals, then the associated bit in the enable speculative load table is set to the second state. In this manner, the displacement of useful data in the data cache due to wrongful execution of the speculative load instruction is avoided.
    • 公开了一种用于选择性地执行高性能处理器中的推测性加载指令的方法。 根据本公开,当遇到数据未被存储在数据高速缓冲存储器中的推测性加载指令时,读取与该特定推测加载指令相关联的使能投机载入表中的位,以便确定 状态的位。 如果关联位处于第一状态,则从系统总线请求用于推测加载指令的数据,然后暂停推测加载指令的进一步执行,以等待来自分支处理单元的控制信号。 如果相关联的位处于第二状态,则推测加载指令的执行被立即停止,以等待来自分支处理单元的控制信号。 如果响应于控制信号执行推测加载指令,则使能推测加载表中的关联位将被设置为第一状态。 然而,如果不响应于控制信号执行推测加载指令,则使能推测负载表中的关联位被设置为第二状态。 以这种方式,避免了由于推测加载指令的错误执行而在数据高速缓存中的有用数据的位移。
    • 7. 发明授权
    • Processor having vector processing capability and method for executing a vector instruction in a processor
    • 具有向量处理能力的处理器和用于在处理器中执行向量指令的方法
    • US06324638B1
    • 2001-11-27
    • US09282268
    • 1999-03-31
    • Thomas ElmerMichael Putrino
    • Thomas ElmerMichael Putrino
    • G06F1517
    • G06F7/5324G06F7/5332G06F9/30014G06F9/30036G06F2207/382G06F2207/3828
    • A processor capable of executing vector instructions includes at least an instruction sequencing unit and a vector processing unit that receives vector instructions to be executed from the instruction sequencing unit. The vector processing unit includes a plurality of multiply structures, each containing only a single multiply array, that each correspond to at least one element of a vector input operand. Utilizing the single multiply array, each of the plurality of multiply structures is capable of performing a multiplication operation on one element of a vector input operand and is also capable of performing a multiplication operation on multiple elements of a vector input operand concurrently. In an embodiment in which the maximum length of an element of a vector input operand is N bits, each of the plurality of multiply arrays can handle both N by N bit integer multiplication and M by M bit integer multiplication, where N is a non-unitary integer multiple of M. At least one of the multiply structures also preferably includes an accumulating adder that receives as a first input a result produced by that multiply structure and receives as a second input a result produced by another multiply structure. From these inputs, the accumulating adder produces as an output an accumulated sum of the results in response to execution of the same instruction that caused the multiply structures to produce the intermediate results.
    • 能够执行向量指令的处理器至少包括指令排序单元和向量处理单元,其从指令排序单元接收要执行的向量指令。 矢量处理单元包括多个乘法结构,每个乘法结构仅包含单个乘法阵列,每个乘法阵列对应于向量输入操作数的至少一个元素。 利用单个乘法阵列,多个乘法结构中的每一个能够对向量输入操作数的一个元素执行乘法运算,并且还能够同时对矢量输入操作数的多个元素执行乘法运算。 在矢量输入操作数的元素的最大长度为N位的实施例中,多个乘法阵列中的每一个可以处理N乘N位整数乘法和M乘M位整数乘法,其中N是非乘法, 多重结构中的至少一个还优选地包括累积加法器,其接收由该乘法结构产生的结果作为第一输入,并且作为第二输入接收由另一乘法结构产生的结果。 从这些输入中,积累加法器响应于导致乘法结构产生中间结果的相同指令的执行而产生结果的累加和。
    • 9. 发明授权
    • System for completing instruction out-of-order which performs target
address comparisons prior to dispatch
    • 用于完成在发送前执行目标地址比较的无序指令的系统
    • US6098168A
    • 2000-08-01
    • US46867
    • 1998-03-24
    • Lee Evan EisenMichael Putrino
    • Lee Evan EisenMichael Putrino
    • G06F9/38
    • G06F9/3842G06F9/3836G06F9/384G06F9/3855G06F9/3857
    • A mechanism structured to check for instruction collisions at the Dispatch Unit rather than the Completion Unit. In processors which issue multiple commands simultaneously, a flag bit is sent to the Completion Unit and attached to the instruction in the queue that follows the other in program order if they both have the same targeted address. When the instructions from position 1 and position 2 of the instruction queue are ready to issue, the Completion Unit checks position 2 for a flag bit. If there is a bit, then the instruction in position 1 is discarded and the instruction in position 2 is written to the target address. If there is no flag bit with the instruction in position 2, the instruction in position 1 is written to the target register. This method eliminates the need to compare all the targeted addresses that are associated with the rename registers. It requires two comparisons instead of a minimum of 15 comparisons.
    • 一种结构化的检查在调度单位而不是完成单位的指令冲突的机制。 在同时发出多个命令的处理器中,如果标志位都具有相同的目标地址,则将标志位发送到完成单元并附加到队列中的跟随另一命令的指令。 当指令队列的位置1和位置2的指令准备发出时,完成单元检查位置2是否有一个标志位。 如果有位,则丢弃位置1的指令,将位置2中的指令写入目标地址。 如果位置2中的指令没有标志位,则将位置1的指令写入目标寄存器。 该方法不需要比较与重命名寄存器相关的所有目标地址。 它需要两次比较,而不是至少15次比较。
    • 10. 发明授权
    • Method and system for fast determination of sticky and guard bits
    • 用于快速测定粘性和保护位的方法和系统
    • US5805487A
    • 1998-09-08
    • US677843
    • 1996-07-12
    • Timothy Alan ElliottChristopher Hans OlsonMichael Putrino
    • Timothy Alan ElliottChristopher Hans OlsonMichael Putrino
    • G06F7/38G06F7/00G06F7/483G06F7/57G06F7/76G06F7/48
    • G06F7/483G06F7/49952G06F7/49957
    • A method and system for fast calculation of the sticky bit and a function of the guard bit is disclosed. A first aspect of the method and system provides a fast calculation of the sticky bit. A second aspect provides a fast calculation of a function of the guard bit. Both aspects comprise means for providing an intermediate result of a floating point mathematical operation involving at least a first and a second operand and means for providing a mask indicating a position of a leading one in a mantissa of the intermediate result. In the first aspect, means for aligning a first bit of the mask to an (n+2)nd bit of the intermediate result, where n is the number of bits in a mantissa of the first or second operand, are coupled to the intermediate result providing means. In the second aspect, means for aligning a first bit of the mask to an (n+1)st bit of the intermediate result are coupled to the intermediate result providing means. In both aspects, means for providing an output are coupled to the aligning means and intermediate result providing means. The output of the first aspect comprises the sticky bit. The output of the second aspect comprises a function of the guard bit. Thus, the method and system allow the sticky bit and a function of the guard bit to be calculated substantially simultaneously with normalization. Because the method and system allow fast determination of the sticky bit and a function of the guard bit, the overall speed of the calculation is increased and system performance is improved.
    • 公开了一种用于快速计算粘滞位和保护位功能的方法和系统。 该方法和系统的第一方面提供了粘性位的快速计算。 第二方面提供了对保护位的功能的快速计算。 两个方面包括用于提供涉及至少第一和第二操作数的浮点数学运算的中间结果的装置,以及用于提供指示中间结果的尾数中的前导位置的掩码的装置。 在第一方面,用于将掩模的第一位与中间结果的第(n + 2)位对齐的装置,其中n是第一或第二操作数的尾数中的位数, 结果提供手段。 在第二方面,用于将掩模的第一位与中间结果的第(n + 1)位进行对准的装置耦合到中间结果提供装置。 在两个方面,用于提供输出的装置耦合到对准装置和中间结果提供装置。 第一方面的输出包括粘点。 第二方面的输出包括保护位的功能。 因此,该方法和系统允许基本上与归一化同时计算粘滞位和保护位的功能。 由于方法和系统允许快速确定粘滞位和保护位的功能,所以计算的总速度提高,系统性能得到提高。