会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明授权
    • Prefetch instruction specifying destination functional unit and read/write access mode
    • 指定目的地功能单元和读/写访问模式的预取指令
    • US06321326B1
    • 2001-11-20
    • US09569102
    • 2000-05-10
    • David B. Witt
    • David B. Witt
    • G06F1500
    • G06F9/383G06F9/3885G06F12/0848G06F12/0862G06F2212/6028
    • A microprocessor is configured to execute a prefetch instruction specifying a cache line to be transferred into the microprocessor, as well as an access mode for the cache line. The microprocessor includes caches optimized for the access modes. In one embodiment, the microprocessor includes functional units configured to operate upon various data type. Each different type of functional unit may be connected to different caches which are optimized for the various access modes. The prefetch instruction may include a functional unit specification in addition to the access mode. In this manner, data of a particular type may be prefetched into a cache local to a particular functional unit.
    • 微处理器被配置为执行指定要传送到微处理器中的高速缓存行的预取指令以及高速缓存行的访问模式。 微处理器包括针对访问模式优化的缓存。 在一个实施例中,微处理器包括被配置为在各种数据类型上操作的功能单元。 每个不同类型的功能单元可以连接到为各种访问模式而优化的不同高速缓存。 除了访问模式之外,预取指令还可以包括功能单元规范。 以这种方式,特定类型的数据可以被预取到特定功能单元本地的高速缓存中。
    • 4. 发明授权
    • Register renaming in which moves are accomplished by swapping tags
    • 注册重命名,通过交换标签来完成哪些移动
    • US06256721B1
    • 2001-07-03
    • US09595726
    • 2000-06-16
    • David B. Witt
    • David B. Witt
    • G06F938
    • G06F9/3842G06F9/30032G06F9/30054G06F9/30069G06F9/30152G06F9/322G06F9/324G06F9/3806G06F9/3816G06F9/382G06F9/383G06F9/384
    • An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction. For such an embodiment, reassigning the rename register as described above may thereby provide a valid value for the stack pointer register, and hence may allow for the generation of lookahead stack pointer values for instructions subsequent to the move instruction to proceed prior to execution of the move instruction. The present embodiment of the register rename unit may also assign the destination rename register selected for the move instruction to the source register of the move instruction (i.e. the rename tags for the source and destination are “swapped”).
    • 用于加速移动操作的装置包括在执行移动指令之前(例如,在选择用于在处理器内进行调度的移动操作)之前检测移动指令的前视单元。 在检测到移动指令时,先行单元发送寄存器重命名单元,该单元将与源寄存器相关联的重命名寄存器重新分配给目的地寄存器。 在一个特定实施例中,前瞻单元尝试加速从基本指针寄存器到堆栈指针寄存器的移动(反之亦然)。 前瞻单元的实施例通过维持先前分派的指令的增量和减量的累积效应来生成堆栈指针寄存器的前置值。 在特定指令之前的增量和减量的累积效应可以被添加到堆栈指针寄存器的先前产生的值以产生该特定指令的前瞻值。 对于这样的实施例,如上所述重新分配重命名寄存器可以由此为堆栈指针寄存器提供有效值,因此可以允许生成用于在执行移动指令之前的移动指令之后的指令的前瞻堆栈指针值 移动指令。 寄存器重命名单元的本实施例还可以将为移动指令选择的目的地重命名寄存器分配给移动指令的源寄存器(即,源和目的地的重命名标签被“交换”)。
    • 5. 发明授权
    • Linearly addressable microprocessor cache
    • 线性可寻址微处理器缓存
    • US06240484B1
    • 2001-05-29
    • US08971805
    • 1997-11-17
    • David B. Witt
    • David B. Witt
    • G06F1210
    • G06F12/1063
    • A microprocessor conforming to the X86 architecture is disclosed which includes a linearly addressable cache, thus allowing the cache to be quickly accessed by an external bus while allowing fast translation to a logical address for operation with functional units of microprocessor. Also disclosed is a microprocessor which includes linear tag array and a physical tag array corresponding to the linear tag array, thus allowing the contents of a microprocessor cache to be advantageously monitored from an external bus without slowing the main instruction and data access processing paths.
    • 公开了符合X86架构的微处理器,其包括线性可寻址高速缓存,从而允许通过外部总线快速访问高速缓存,同时允许快速转换为逻辑地址以与微处理器的功能单元一起操作。 还公开了一种微处理器,其包括对应于线性标签阵列的线性标签阵列和物理标签阵列,从而允许从外部总线有利地监视微处理器高速缓存的内容,而不会减慢主指令和数据访问处理路径。
    • 6. 发明授权
    • Universal dependency vector/queue entry
    • 通用依赖向量/队列条目
    • US06212623B1
    • 2001-04-03
    • US09139178
    • 1998-08-24
    • David B. Witt
    • David B. Witt
    • C06F1500
    • G06F9/3814G06F9/3834G06F9/3836G06F9/3838G06F9/384G06F9/3857
    • A processor employs an instruction queue and dependency vectors therein which allow a flexible dependency recording structure. The dependency vector includes a dependency indication for each instruction queue entry, which may provide a universal mechanism for scheduling instruction operations. An arbitrary number of dependencies may be recorded for a given instruction operation, up to a dependency upon each other instruction operation. Since the dependency vector is configured to record an arbitrary number of dependencies, a given instruction operation can be ordered with respect to any other instruction operation. Accordingly, any architectural or microarchitectural restrictions upon concurrent execution or upon order of particular instruction operations in execution may be enforced. The instruction queues evaluate the dependency vectors and request scheduling for each instruction operation for which the recorded dependencies have been satisfied.
    • 处理器采用允许柔性依赖记录结构的指令队列和依赖性向量。 依赖向量包括每个指令队列条目的依赖指示,其可以提供用于调度指令操作的通用机制。 对于给定的指令操作可以记录任意数量的依赖性,直到彼此指示操作的依赖性。 由于依赖向量被配置为记录任意数量的依赖性,所以可以针对任何其他指令操作来排序给定的指令操作。 因此,可以执行对执行中的并行执行或特定指令操作的命令的任何架构或微架构限制。 指令队列对依赖向量进行评估,并对已经满足记录依赖关系的每个指令操作请求调度。
    • 7. 发明授权
    • Pipelined data cache with multiple ports and processor with load/store unit selecting only load or store operations for concurrent processing
    • 具有多个端口和处理器的流水线数据高速缓存,加载/存储单元仅选择用于并发处理的加载或存储操作
    • US06202139B1
    • 2001-03-13
    • US09100291
    • 1998-06-19
    • David B. WittJames K. Pickett
    • David B. WittJames K. Pickett
    • G06F1300
    • G11C7/1039G06F9/3824G06F9/3869G06F12/0853G06F12/0855G11C7/1072G11C8/16
    • A computer system includes a processor having a cache which includes multiple ports, although a storage array included within the cache may employ fewer physical ports than the cache supports. The cache is pipelined and operates at a clock frequency higher than that employed by the remainder of a microprocessor including the cache. In one embodiment, the cache preferably operates at a clock frequency which is at least a multiple of the clock frequency at which the remainder of the microprocessor operates. The multiple is equal to the number of ports provided on the cache (or the ratio of the number of ports provided on the cache to the number of ports provided internally, if more than one port is supported internally). Accordingly, the accesses provided on each port of the cache during a clock cycle of the microprocessor clock can be sequenced into the cache pipeline prior to commencement of the subsequent clock cycle. In one particular embodiment, the load/store unit of the microprocessor is configured to select only load memory operations or only store memory operations for concurrent presentation to the data cache. Accordingly, the data cache may be performing only reads or only writes to its internal array during a clock cycle. The data cache may implement several techniques for accelerating access time based upon this feature. For example, the bit lines within the data cache array may be only balanced between accesses instead of precharging (and potentially balancing).
    • 计算机系统包括具有包括多个端口的高速缓存的处理器,尽管包含在高速缓存内的存储阵列可以采用比缓存支持更少的物理端口。 高速缓冲存储器是流水线的,并以比包括高速缓存的微处理器的其余部分所采用的时钟频率更高的时钟频率工作。 在一个实施例中,高速缓存优选地以至少为微处理器的其余部分工作的时钟频率的倍数的时钟频率操作。 该倍数等于缓存上提供的端口数(或缓存上提供的端口数与内部提供的端口数之间的比例,如果内部支持多个端口)。 因此,在微处理器时钟的时钟周期期间,在高速缓存的每个端口上提供的访问可以在随后的时钟周期开始之前被排序到高速缓存流水线中。 在一个具体实施例中,微处理器的加载/存储单元被配置为仅选择加载存储器操作或仅存储用于并发呈现到数据高速缓存的存储器操作。 因此,在时钟周期期间,数据高速缓存可以仅执行读取或仅执行对其内部阵列的写入。 基于该特征,数据高速缓存可以实现几种用于加速访问时间的技术。 例如,数据高速缓存阵列中的位线可以仅在访问之间进行平衡,而不是预充电(和潜在的平衡)。
    • 8. 发明授权
    • Selecting cache to fetch in multi-level cache system based on fetch address source and pre-fetching additional data to the cache for future access
    • 选择缓存以在多级缓存系统中基于获取地址源进行提取,并将其他数据预取到缓存以供将来访问
    • US06199154B1
    • 2001-03-06
    • US09099984
    • 1998-06-19
    • David B. Witt
    • David B. Witt
    • G06F906
    • G06F9/30021G06F9/30058G06F9/322G06F9/324G06F9/3804G06F9/381G06F9/382G06F9/383G06F9/3838G06F9/3842G06F12/0862G06F12/0897
    • A processor employs a first instruction cache, a second instruction cache, and a fetch unit employing a fetch/prefetch method among the first and second instruction caches designed to provide high fetch bandwidth. The fetch unit selects a fetch address based upon previously fetched instructions (e.g. the existence or lack thereof of branch instructions within the previously fetched instructions) from a variety of fetch address sources. Depending upon the source of the fetch address, the fetch address is presented to one of the first and second instruction caches for fetching the corresponding instructions. If the first cache is selected to receive the fetch address, the fetch unit may select a prefetch address for presentation to the second cache. The prefetch address is selected from a variety of prefetch address sources and is presented to the second instruction cache. Instructions prefetched in response to the prefetch address are provided to the first instruction cache for storage. In one embodiment, the first instruction cache may be a low latency, relatively small cache while the second instruction cache may be a higher latency, relatively large cache. Fetch addresses from many of the fetch address sources may be likely to hit in the first instruction cache. Other fetch addresses may be less likely to hit in the first instruction cache. Accordingly, these fetch addresses may be immediately fetched from the second instruction cache, instead of first attempting to fetch from the first instruction cache.
    • 处理器采用第一指令高速缓存,第二指令高速缓存以及采用提取/预取方法的第一和第二指令高速缓冲存储器中的提取单元,该第一和第二指令高速缓冲存储器被设计为提供高取样带宽。 提取单元基于先前获取的指令(例如,先前获取的指令中存在或不存在分支指令)从各种提取地址源中选择提取地址。 取决于获取地址的来源,将取出地址呈现给第一和第二指令高速缓冲存储器之一,用于取出相应的指令。 如果选择第一高速缓存以接收取指地址,则提取单元可以选择用于呈现给第二高速缓存的预取地址。 预取地址从各种预取地址源中选择并被呈现给第二指令高速缓存。 响应于预取地址预取的指令被提供给第一指令高速缓存用于存储。 在一个实施例中,第一指令高速缓存可以是低等待时间,相对小的高速缓存,而第二指令高速缓存可以是较高等待时间的相对较大的高速缓存。 从多个获取地址源获取地址可能会在第一个指令高速缓存中命中。 其他提取地址可能不太可能在第一指令高速缓存中命中。 因此,可以从第二指令高速缓存中立即取出这些提取地址,而不是首先尝试从第一指令高速缓存取出。
    • 9. 发明授权
    • Fully associate cache employing LRU groups for cache replacement and
mechanism for selecting an LRU group
    • 使用LRU组完全关联高速缓存替换和选择LRU组的机制
    • US6161167A
    • 2000-12-12
    • US884435
    • 1997-06-27
    • David B. Witt
    • David B. Witt
    • G06F9/38G06F12/08G06F12/12
    • G06F9/3824G06F12/0897G06F12/123
    • A microprocessor employs an L0 cache. The L0 cache is located physically near the execute units of the microprocessor and is relatively small in size as compared to a larger L1 data cache included within the microprocessor. The L0 cache is accessed for those memory operations for which an address is being conveyed to a Load/store unit within the microprocessor during the clock cycle in which the memory operation is selected for access to the L1 data cache. The address corresponding to the memory operation is received by the L0 cache directly from the execute unit forming the address. If a hit in the L0 cache is detected, the L0 cache either forwards data or stores data corresponding to the memory operation (depending upon the type of the memory operation). The memory operation is conveyed to the L1 data cache in parallel with the memory operation accessing the L0 cache. If the memory operation misses in the L0 cache and hits in the L1 data cache, the cache line corresponding to the memory operation may be conveyed to the L0 cache as a line fill.
    • 微处理器采用L0缓存。 L0高速缓存位于微处理器的执行单元的物理附近,与微处理器内包含的较大的L1数据高速缓存相比,尺寸相对较小。 在选择存储器操作以访问L1数据高速缓存的时钟周期期间,为存储器操作访问L0缓存,其中地址被传送到微处理器内的加载/存储单元。 与存储器操作相对应的地址由L0缓存直接从形成地址的执行单元接收。 如果检测到L0缓存中的命中,则L0高速缓存转发数据或存储对应于存储器操作的数据(取决于存储器操作的类型)。 存储器操作与访问L0高速缓存的存储器操作并行地传送到L1数据高速缓存。 如果存储器操作在L0高速缓存中丢失并且在L1数据高速缓存中命中,则与存储器操作相对应的高速缓存行可以作为行填充被传送到L0高速缓存。
    • 10. 发明授权
    • Symmetrical instructions queue for high clock frequency scheduling
    • 对称指令队列用于高时钟频率调度
    • US6122727A
    • 2000-09-19
    • US139056
    • 1998-08-24
    • David B. Witt
    • David B. Witt
    • G06F9/38G06F15/00
    • G06F9/3822G06F9/3836G06F9/3838G06F9/384G06F9/3857
    • An instruction queue is physically divided into two (or more) instruction queues. Each instruction queue is configured to store a dependency vector for each instruction operation stored in that instruction queue. The dependency vector is evaluated to determine if the corresponding instruction operation may be scheduled for execution. Instruction scheduling logic in each physical queue may schedule instruction operations based on the instruction operations stored in that physical queue independent of the scheduling logic in other queues. The instruction queues evaluate the dependency vector in portions, during different phases of the clock. During a first phase, a first instruction queue evaluates a first portion of the dependency vectors and generates a set of intermediate scheduling request signals. During a second phase, the first instruction queue evaluates a second portion of the dependency vector and the intermediate scheduling request signal to generate a scheduling request signal. The second instruction queue may evaluate the portions of the dependency vector in the second phase and the first phase of the clock, respectively. In other words, the second instruction queue may operate 1/2 clock cycle off of the first instruction queue. Satisfaction of dependencies upon an instruction operation in the opposite queue may thereby propagate to scheduling of the dependent instruction operation in 1/2 clock cycle.
    • 指令队列在物理上分为两个(或多个)指令队列。 每个指令队列被配置为存储在该指令队列中的每个指令操作的依赖性向量。 评估依赖向量以确定是否可以调度相应的指令操作来执行。 每个物理队列中的指令调度逻辑可以基于存储在该物理队列中的指令操作来调度指令操作,而与其他队列中的调度逻辑无关。 指令队列在时钟的不同阶段以部分方式评估依赖向量。 在第一阶段期间,第一指令队列评估依赖向量的第一部分并生成一组中间调度请求信号。 在第二阶段期间,第一指令队列评估依赖性向量的第二部分和中间调度请求信号,以生成调度请求信号。 第二指令队列可以分别评估时钟的第二阶段和第一阶段中的依赖向量的部分。 换句话说,第二指令队列可以操作第一指令队列的+ E,fra 1/2 + EE时钟周期。 依赖于相对队列中的指令操作的满足因此可以传播到+ E,fra 1/2 + EE时钟周期中的依赖指令操作的调度。