会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 41. 发明授权
    • Physical rename register for efficiently storing floating point, integer, condition code, and multimedia values
    • 物理重命名寄存器,用于高效存储浮点数,整数,条件码和多媒体值
    • US06266763B1
    • 2001-07-24
    • US09225982
    • 1999-01-05
    • David B. WittJames B. Keller
    • David B. WittJames B. Keller
    • G06F9312
    • G06F9/384G06F9/3013G06F9/3804G06F9/3836G06F9/3838G06F9/3855G06F9/3857
    • A register renaming apparatus includes one or more physical registers which may be assigned to store a floating point value, a multimedia value, an integer value and corresponding condition codes, or condition codes only. The classification of the instruction (e.g. floating point, multimedia, integer, flags-only) defines which lookahead register state is updated (e.g. floating point, integer, flags, etc.), but the physical register can be selected from the one or more physical registers for any of the instruction types. Determining if enough physical registers are free for assignment to the instructions being selected for dispatch includes considering the number of instructions selected for dispatch and the number of free physical registers, but excludes the data type of the instruction. When a code sequence includes predominately instructions of a particular data type, many of the physical registers may be assigned to that data type (efficiently using the physical register resource). By contrast, if different sets of physical registers are provided for different data types, only the physical registers used for the particular data type may be used for the aforementioned code sequence. Additional efficiencies may be realized in embodiments in which an integer register and condition codes are both updated by many instructions. One physical register may concurrently represent the architected state of both the flags register and the integer register. Accordingly, a given functional unit may forward a single physical register number for both results.
    • 寄存器重命名装置包括一个或多个物理寄存器,其可被分配用于仅存储浮点值,多媒体值,整数值和相应的条件代码或条件代码。 指令的分类(例如浮点,多媒体,整数,仅标志)定义哪个前瞻寄存器状态被更新(例如浮点,整数,标志等),但物理寄存器可以从一个或多个 任何指令类型的物理寄存器。 确定是否有足够的物理寄存器用于分配给选择用于调度的指令,包括考虑选择用于调度的指令数量和空闲物理寄存器的数量,但不包括指令的数据类型。 当代码序列主要包括特定数据类型的指令时,许多物理寄存器可被分配给该数据类型(有效地使用物理寄存器资源)。 相比之下,如果针对不同的数据类型提供不同的物理寄存器集合,则只有用于特定数据类型的物理寄存器可以用于上述代码序列。 在其中整数寄存器和条件码都被许多指令更新的实施例中可以实现额外的效率。 一个物理寄存器可以同时表示标志寄存器和整数寄存器的架构状态。 因此,给定的功能单元可以转发两个结果的单个物理寄存器号。
    • 43. 发明授权
    • Processor configured to map logical register numbers to physical register numbers using virtual register numbers
    • 处理器配置为使用虚拟寄存器号将逻辑寄存器号映射到物理寄存器号
    • US06247106B1
    • 2001-06-12
    • US09626556
    • 2000-07-27
    • David B. Witt
    • David B. Witt
    • G06F1208
    • G06F9/3863G06F9/3838G06F9/384G06F9/3857
    • A processor employing a map unit including register renaming hardware is shown. The map unit may assign virtual register numbers to source registers by scanning instruction operations to detect intraline dependencies. Subsequently, physical register numbers are mapped to the source register numbers responsive to the virtual register numbers. The map unit may stores (e.g. in a map silo) a current lookahead state corresponding to each line of instruction operations which are processed by the map unit Additionally, the map unit stores an indication of which instruction operations within the line update logical registers, which logical registers are updated, and the physical register numbers assigned to the instruction operations. Upon detection of an exception condition for an instruction operation with a line, the current lookahead state corresponding to the line is restored from the map silo. Additionally, physical register numbers corresponding to instruction operations within the line which are prior to the instruction operation experiencing the exception are restored into the current lookahead state. The map unit may use the same physical register to store both a condition code result and an integer result. The physical register number identifying the physical register is recorded for both the condition code register and the integer register. The map unit pops the previous renames from the architected renames block upon retiring one or more instruction operations. The popped physical register numbers are cammed against the updated architectural state. If a cam match is detected, the popped physical register is not freed.
    • 示出了采用包括寄存器重命名硬件的映射单元的处理器。 映射单元可以通过扫描指令操作来分配虚拟寄存器号码到源寄存器,以检测内联依赖性。 随后,响应于虚拟寄存器号,物理寄存器号被映射到源寄存器号。 地图单元可以存储(例如,在地图仓库中)与由地图单元处理的每条指令操作相对应的当前前视状态。另外,地图单元存储行更新逻辑寄存器内哪个指令操作的指示,其中 更新逻辑寄存器,分配给指令操作的物理寄存器编号。 当检测到用于线的指令操作的异常条件时,与该行相对应的当前前视状态从地图仓中恢复。 此外,对应于在经历异常的指令操作之前的行内的指令操作的物理寄存器编号被恢复到当前的前瞻状态。 地图单元可以使用相同的物理寄存器来存储条件码结果和整数结果。 为条件码寄存器和整数寄存器记录识别物理寄存器的物理寄存器号。 退出一个或多个指令操作后,地图单元会从设计的重命名块弹出先前的重命名。 弹出的物理寄存器编号与更新的架构状态相关。 如果检测到凸轮匹配,弹出的物理寄存器不会被释放。
    • 44. 发明授权
    • Cumulative lookahead to eliminate chained dependencies
    • 累积的前瞻性,以消除链接依赖
    • US06240503B1
    • 2001-05-29
    • US09190809
    • 1998-11-12
    • David B. Witt
    • David B. Witt
    • G06F930
    • G06F9/3806G06F9/30101G06F9/3017G06F9/383G06F9/3836G06F9/3838G06F9/384G06F9/3857
    • A processor is configured to generate lookahead values using a cumulative constant. The processor classifies operations to a particular register (e.g. the stack pointer register, or ESP in an embodiment employing the x86 instruction set architecture) as either accelerated or non-accelerated. For example, instructions which are defined to increment/decrement the particular register by an explicit or implicit constant value may be accelerated operations. Upon the occurrence of a non-accelerated operation, the processor may begin accumulating the cumulative effect of accelerated operations to the result of the non-accelerated operation as a cumulative offset. The result of the non-accelerated operation (upon execution thereof) may then be added to the cumulative offset values corresponding to each accelerated operation to generate the particular register value corresponding to that accelerated operation. Accordingly, dependencies upon the register due to the accelerated operations may be alleviated. Accelerated operations may execute in parallel upon provision of the value generated by the non-accelerated operations. The cumulative value may be maintained across multiple cycles of instruction dispatch, thereby allowing for dependency alleviation across the multiple cycles of instruction dispatch.
    • 处理器被配置为使用累积常数来生成前瞻值。 处理器将操作分类为特定寄存器(例如,在采用x86指令集架构的实施例中的堆栈指针寄存器或ESP)作为加速或非加速。 例如,定义为通过显式或隐式常数值递增/递减特定寄存器的指令可以是加速操作。 在发生非加速操作时,处理器可以开始将加速操作的累积效应累积到非加速操作的结果作为累积偏移量。 然后可以将非加速操作(执行时)的结果加到对应于每个加速操作的累积偏移值,以产生与该加速操作对应的特定寄存器值。 因此,可以减轻由于加速操作而对寄存器的依赖。 通过提供非加速操作产生的值,可以并行执行加速操作。 可以在多个指令分派周期之间维持累积值,从而允许跨指令分派的多个周期的依赖性减轻。
    • 45. 发明授权
    • Superscalar microprocessor employing a data cache capable of performing store accesses in a single clock cycle
    • 超标量微处理器采用能够在单个时钟周期内执行存储访问的数据高速缓存器
    • US06189068B1
    • 2001-02-13
    • US09342071
    • 1999-06-28
    • David B. WittRajiv M. Hattangadi
    • David B. WittRajiv M. Hattangadi
    • G06F1576
    • G06F12/0888G06F9/3004G06F9/30043G06F9/30087G06F9/3824G06F9/3828G06F9/3832G06F9/3834G06F9/3842G06F12/0855G06F12/0864G06F2212/6082
    • A superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided. The superscalar microprocessor speculatively stores data within a predicted way of the data cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete, utilizing a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The superscalar microprocessor may therefore be capable of high frequency operation.
    • 提供了采用配置为在单个时钟周期中执行存储访问的数据高速缓冲存储器的超标量微处理器。 超标量微处理器在以预测的方式捕获当前正在存储的数据之后,以数据高速缓存的预测方式推测存储数据。 在随后的时钟周期中,用于存储访问的高速缓存命中信息验证预测方式。 如果预测方式正确,则存储完成,利用数据高速缓存带宽的单个时钟周期。 此外,在数据高速缓存中实现的预测结构的方式绕过数据高速缓存的标签比较,以选择输出的数据字节。 因此,关联数据高速缓存的访问时间可以基本上类似于直接映射的高速缓存访​​问时间。 因此,超标量微处理器能够进行高频操作。
    • 46. 发明授权
    • Reorder buffer employed in a microprocessor to store instruction results
having a plurality of entries predetermined to correspond to a
plurality of functional units
    • 在微处理器中使用的重排序缓冲器来存储具有预定为对应于多个功能单元的多个条目的指令结果
    • US6134651A
    • 2000-10-17
    • US458816
    • 1999-12-10
    • David B. WittThang M. Tran
    • David B. WittThang M. Tran
    • G06F9/38G06F9/46G06F12/12G06F15/00
    • G06F9/52G06F12/126G06F9/30036G06F9/382G06F9/3836G06F9/3838G06F9/384G06F9/3853G06F9/3855G06F9/3857G06F9/3885
    • A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.
    • 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。 重新排序缓冲器提供当寄存器用作另一个指令的源操作数时,存储在与寄存器相对应的存储位置中的值(重新排序缓冲器标签或指令结果)。 允许访问和更新寄存器部分的微处理器的未来文件的另一个优点是,在更新较窄寄存器的指令完成后,解决了窄到宽的依赖关系。
    • 47. 发明授权
    • Method for deriving a double frequency microprocessor from an existing
microprocessor
    • 从现有微处理器得到双频微处理器的方法
    • US6081656A
    • 2000-06-27
    • US884431
    • 1997-06-27
    • David B. Witt
    • David B. Witt
    • G06F9/38G06F17/50
    • G06F9/3867G06F17/5045G06F9/3869
    • A first microprocessor having a PH1/PH2 pipeline structure is designed. The first microprocessor undergoes a design cycle including microarchitecture, design (e.g. logic design, circuit design, and layout work), verification, qualification, and volume manufacture. Subsequently, a second microprocessor is derived from the first microprocessor by replacing the PH1 and PH2 latches with edge triggered flip flops connected to a clock line which is operable at approximately twice the frequency of the clock signals used in the PH1/PH2 pipeline. A minimal design effort may be employed to produce the second microprocessor. The microarchitecture of the second microprocessor is quite similar to the microarchitecture of the first microprocessor. Still further, much of the design, verification, and qualification work performed for the first microprocessor may be reused for the second microprocessor.
    • 设计了具有PH1 / PH2流水线结构的第一微处理器。 第一个微处理器经历了一个设计周期,包括微架构,设计(例如逻辑设计,电路设计和布局工作),验证,鉴定和批量制造。 随后,通过用连接到时钟线的边沿触发触发器来代替PH1和PH2锁存器,从第一微处理器得到第二个微处理器,该时钟线的工作电压大约为在PH1 / PH2流水线中使用的时钟信号的两倍。 可以采用最小的设计努力来产生第二微处理器。 第二个微处理器的微架构非常类似于第一个微处理器的微架构。 此外,对于第一微处理器执行的大部分设计,验证和鉴定工作可以被再次用于第二微处理器。
    • 48. 发明授权
    • Microprocessor including virtual address branch prediction and current
page register to provide page portion of virtual and physical fetch
address
    • 微处理器包括虚拟地址分支预测和当前页面寄存器,以提供虚拟和物理提取地址的页面部分
    • US6079005A
    • 2000-06-20
    • US975224
    • 1997-11-20
    • David B. WittThang M. Tran
    • David B. WittThang M. Tran
    • G06F9/32G06F9/38G06F12/10
    • G06F9/3806G06F12/1054G06F9/30058
    • A microprocessor employs a branch prediction unit including a branch prediction storage which stores the index portion of branch target addresses and an instruction cache which is virtually indexed and physically tagged. The branch target index (if predicted-taken), or the sequential index (if predicted not-taken) is provided as the index to the instruction cache. The selected physical tag is provided to a reverse translation lookaside buffer (TLB) which translates the physical tag to a virtual page number. Concatenating the virtual page number to the virtual index from the instruction cache (and the offset portion, generated from the branch prediction) results in the branch target address being generated. In one embodiment, a current page register stores the most recently translated virtual page number and the corresponding real page number. The branch prediction unit predicts that each fetch address will continue to reside in the current page and uses the virtual page number from the current page to form the branch Target address. The physical tag from the fetched cache line is compared to the corresponding real page number to verify that the fetch address is actually still within the current page. When a mismatch is detected between the corresponding real page number and the physical tag from the fetched cache line, the branch target address is corrected with the linear page number provided by the reverse TLB and the current page register is updated.
    • 微处理器采用分支预测单元,该分支预测单元包括分支预测存储器,该分支预测存储器存储分支目标地址的索引部分和虚拟索引并被物理标记的指令高速缓存。 提供分支目标索引(如果预测取得)或顺序索引(如果预测未被采用)作为指令高速缓存的索引。 所选择的物理标签被提供给反向翻译后备缓冲器(TLB),其将物理标签转换成虚拟页码。 将虚拟页号连接到来自指令高速缓存(以及从分支预测生成的偏移部分)的虚拟索引导致生成分支目标地址。 在一个实施例中,当前页面寄存器存储最近翻译的虚拟页面号码和相应的真实页面号码。 分支预测单元预测每个获取地址将继续驻留在当前页面中,并且使用当前页面中的虚拟页面号来形成分支目标地址。 将获取的高速缓存行中的物理标记与相应的实际页码进行比较,以验证提取地址实际上仍在当前页面中。 当在相应的实际页码与来自取出的高速缓存行的物理标记之间检测到不匹配时,用反向TLB提供的线性页码修正分支目标地址,并更新当前页寄存器。
    • 49. 发明授权
    • Floating point unit using a central window for storing instructions
capable of executing multiple instructions in a single clock cycle
    • 浮点单元使用中央窗口存储能够在单个时钟周期内执行多个指令的指令
    • US6018798A
    • 2000-01-25
    • US993477
    • 1997-12-18
    • David B. WittDerrick R. Meyer
    • David B. WittDerrick R. Meyer
    • G06F9/38G06F9/30
    • G06F9/3885G06F9/383G06F9/3836G06F9/384G06F9/3855G06F9/3857
    • A floating point unit capable of executing multiple instructions in a single clock cycle using a central window and a register map is disclosed. The floating point unit comprises: a plurality of translation units, a future file, a central window, a plurality of functional units, a result queue, and a plurality of physical registers. The floating point unit receives speculative instructions, decodes them, and then stores them in the central window. Speculative top of stack values are generated for each instruction during decoding. Top of stack relative operands are computed to physical registers using a register map. Register stack exchange operations are performed during decoding. Instructions are then stored in the central window, which selects the oldest stored instructions to be issued to each functional pipeline and issues them. Conversion units convert the instruction's operands to an internal format, and normalization units detect and normalize any denormal operands. Finally, the functional pipelines execute the instructions.
    • 公开了一种能够使用中央窗口和寄存器映射在单个时钟周期中执行多个指令的浮点单元。 浮点单元包括:多个翻译单元,未来文件,中央窗口,多个功能单元,结果队列和多个物理寄存器。 浮点单元接收推测指令,对它们进行解码,然后将其存储在中央窗口中。 在解码过程中,每个指令产生堆栈值的推测顶点。 堆栈顶部相对操作数使用寄存器映射计算到物理寄存器。 在解码期间执行寄存器堆栈交换操作。 然后将指令存储在中央窗口中,其中选择要发布到每个功能管道的最早存储的指令并发出它们。 转换单位将指令的操作数转换为内部格式,归一化单元检测和归一化任何反常操作数。 最后,功能管线执行指令。
    • 50. 发明授权
    • Pre-decoded instruction cache and method therefor particularly suitable
for variable byte-length instructions
    • 预解码指令高速缓存及其方法特别适用于可变字节长度指令
    • US5970235A
    • 1999-10-19
    • US951803
    • 1997-10-16
    • David B. WittMichael D. Goddard
    • David B. WittMichael D. Goddard
    • G06F12/08G06F9/30G06F9/318G06F9/32G06F9/38
    • G06F9/382G06F9/30152G06F9/30174G06F9/3816
    • An instruction cache for a superscalar processor having a variable byte-length instruction format, such as the X86 format, is organized as a 16K byte 4-way set-associative cache. An instruction store array is organized as 1024 blocks of 16 predecoded instruction bytes. The instruction bytes are prefetched and predecoded to facilitate the subsequent parallel decoding and mapping of up to four instructions into a sequence of one or more internal RISC-like operations (ROPs), and the parallel dispatch of up to 4 ROPs by an instruction decoder. Predecode bits are assigned to each instruction byte and are stored with the corresponding instruction byte in the instruction store array. The predecode bits include bits for identifying the starting, ending, and opcode bytes, and for specifying the number of ROPs that an instruction maps into. An address tag array is dual-ported and contains 1024 entries, each composed of a 20-bit address tag, a single valid bit for the entire block, and 16 individual byte-valid bits, one for each of the 16 corresponding instruction bytes within the instruction store array. A successor array is dual-ported and contains 1024 entries, each composed of a 14-bit successor index, a successor valid bit which indicates that the successor index stored in the successor array should be used to access the instruction store array or that no branch is predicted taken within the instruction block, and a block branch index which indicates the byte location within the current instruction block of the last instruction byte predicted to be executed.
    • 具有诸如X86格式的可变字节长度指令格式的超标量处理器的指令高速缓存被组织为16K字节的4向组关联高速缓存。 指令存储阵列被组织为1024个16个预解码指令字节的块。 指令字节被预取和预解码,以便于将多达四条指令的后续并行解码和映射到一个或多个内部类似RISC的操作(ROP)的序列中,并且由指令解码器并行调度多达4个ROP。 预解码位分配给每个指令字节,并与指令存储阵列中的相应指令字节一起存储。 预解码位包括用于标识开始,结束和操作码字节的位,以及用于指定指令映射到的ROP的数量。 地址标签阵列是双端口的,包含1024个条目,每个条目由20位地址标签,整个块的单个有效位和16个单独的字节有效位组成,每个字节有效位为16个对应的指令字节 指令存储阵列。 后继数组是双​​端口的,包含1024个条目,每个条目由14位后继索引组成,后继有效位表示存储在后继数组中的后继索引应用于访问指令存储阵列或不分支 预测在指令块内被预测,以及块分支索引,其指示预测要执行的最后指令字节的当前指令块内的字节位置。