会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Block-based branch prediction using a target finder array storing target
sub-addresses
    • 基于块的分支预测,使用存储目标子地址的目标取景器阵列
    • US5608886A
    • 1997-03-04
    • US298778
    • 1994-08-31
    • James S. BlomgrenEarl T. CohenBrian R. Baird
    • James S. BlomgrenEarl T. CohenBrian R. Baird
    • G06F9/318G06F9/38G06F9/30
    • G06F9/3806G06F9/30174G06F9/30189G06F9/30196G06F9/322G06F9/3836G06F9/3861
    • A target finder array in the instruction cache contains a lower portion of the target address and a block encoding indicating if the target address is within the same 2K-byte block that the branch instruction is in, or if the target address is in the next or previous 2K-byte block. The upper portion of the target address, its block number, which corresponds to the starting address of a 2K block, is generated from the target finder simply by taking the upper portion or block number of the branch instruction and incrementing and decrementing it, and using the block encoding in the finder to select either the unmodified block number of the branch instruction, or the incremented or decremented block number of the branch instruction. The lower portion of the target address that was stored in the finder is concatenated with the selected block number to get the predicted target address. The target address can be predicted in parallel with reading an instruction out of the cache, making the target available at the same time the branch instruction is available, eliminating pipeline stalls for correctly predicted branches. The initially predicted target address in the finder is generated by a quick decode of the instruction and is written when the cache is loaded from memory. The initial prediction does not have to be accurate because branch resolution logic will update the finder on each branch resolution. Register indirect branches and exceptions may also be predicted. Two instruction sets may be accommodated by different block encodings to indicate the instruction set. By using the block encoding, the finder array is small and inexpensive.
    • 指令高速缓存中的目标取景器阵列包含目标地址的较低部分和指示目标地址是否在分支指令所在的相同2K字节块内的块编码,或者目标地址在下一个或 以前的2K字节块。 目标地址的上部,其对应于2K块的开始地址的块号,仅通过取出分支指令的上限或块号并从而从目标取景器生成,并使用 取景器中的块编码以选择分支指令的未修改块号,或分支指令的递增或递减块号。 存储在取景器中的目标地址的下部与所选块号连接以获得预测的目标地址。 可以通过从缓存中读取指令并行地预测目标地址,使目标在分支指令可用的同时可用,消除正确预测分支的流水线停顿。 取景器中初始预测的目标地址是通过指令的快速解码产生的,并在从存储器加载高速缓存时写入。 初始预测不一定是准确的,因为分支分辨率逻辑将在每个分支分辨率上更新取景器。 也可以预测注册间接分支和异常。 可以通过不同的块编码来容纳两个指令集,以指示指令集。 通过使用块编码,查找器阵列小而便宜。
    • 2. 发明授权
    • Program watchpoint checking using paging with sub-page validity
    • 使用具有子页有效性的分页进行程序观察点检查
    • US5598553A
    • 1997-01-28
    • US444813
    • 1995-05-18
    • David E. RichterEarl T. CohenJames S. Blomgren
    • David E. RichterEarl T. CohenJames S. Blomgren
    • G06F11/36G06F12/10G06F9/455
    • G06F11/3648G06F12/1027G06F12/1036G06F12/109
    • Segmentation is added to a reduced instruction set computer (RISC) processor which supports paging. The arithmetic-logic-unit (ALU) is extended to allow for a 3-port addition so that the segment base can be added when the virtual address is being generated. Segment bounds checking is achieved by extending the paging system to allow for valid regions that are less than the full page size. Sub-page validity can mimic segmentation because a segment can be broken up into a number of full pages and one or more partially-valid pages at the segment boundaries. A page that is not wholly valid has an "event" on the page, and a memory reference to this page will either cause a software routine to be invoked to check the segment bound, or an extension to the TLB, called a sub-page validity buffer, is used to check if the reference was to a valid portion of the page. Events may also be defined for program watchpoints and defective memory locations. Segment bounds thus do not have to be compared for each access, and the bounds do not even have to be stored on the CPU die.
    • 分段被添加到支持寻呼的精简指令集计算机(RISC)处理器。 扩展算术逻辑单元(ALU)以允许3端口添加,以便在生成虚拟地址时添加段基础。 通过扩展分页系统来实现分段边界检查,以允许小于整页大小的有效区域。 子页面有效性可以模拟分割,因为片段可以分割成多个完整页面和一个或多个部分有效页面在段边界。 不完全有效的页面在页面上具有“事件”,并且对该页面的存储器引用将导致调用软件例程来检查段绑定,或者称为子页面的TLB的扩展 有效性缓冲区,用于检查引用是否是页面的有效部分。 事件也可以定义为程序观察点和缺陷记忆位置。 因此,对于每个访问,段边界不必进行比较,并且边界甚至不必被存储在CPU管芯上。
    • 3. 发明授权
    • Address tracking and branch resolution in a processor with multiple
execution pipelines and instruction stream discontinuities
    • 具有多个执行流水线和指令流不连续性的处理器中的地址跟踪和分支分辨率
    • US5542109A
    • 1996-07-30
    • US298771
    • 1994-08-31
    • James S. BlomgrenEarl T. Cohen
    • James S. BlomgrenEarl T. Cohen
    • G06F9/30G06F9/32G06F9/38
    • G06F9/30149G06F9/322G06F9/3863G06F9/3875
    • An address of any desired instruction in a super-scalar processor is generated using address tracking logic. A sequential address register in the last stage of the processor's pipelines holds the address of the last or oldest instruction in the pipelines. This register is updated with a target address when a branch instruction is actually taken. A pipeline valid array contains valid bits for the instructions in the pipelines, and also contains the lengths of the instructions for complex instruction sets having instructions that vary in length. The address of the desired instruction is calculated as the sum of a base address and an adjustment value. The base address is the address of the last instruction which is stored in the sequential address register when there are no intervening taken branches between the desired instruction and the last instruction in the pipelines. When there is an intervening taken branch, the target address from the taken branch closest to the desired instruction is selected as the base address. The adjustment value is the sum of all the instruction lengths for instructions between the desired instruction and the last instruction, or the closest intervening taken branch if it exists. A branch resolver uses this address tracking logic to generate the address of a branch instruction being resolved, and the address of the following sequential instruction. A recovery address for branch mis-prediction sent to the instruction fetcher is the following sequential address when the branch is actually not taken, and is the target address when the branch is actually taken. The branch can be resolved in any pipeline stage.
    • 使用地址跟踪逻辑生成超标量处理器中任何所需指令的地址。 处理器管道最后阶段的顺序地址寄存器保存管道中最后一个或最旧的指令的地址。 当实际采用分支指令时,该寄存器用目标地址更新。 流水线有效阵列包含管道中的指令的有效位,并且还包含具有长度不同的指令的复杂指令集的指令长度。 所需指令的地址被计算为基地址和调整值的总和。 基地址是当期望指令和管道中的最后一条指令之间没有间隔的分支时,存储在顺序地址寄存器中的最后一条指令的地址。 当存在插入的分支时,选择最接近所需指令的拍摄分支的目标地址作为基地址。 调整值是所需指令与最后一条指令之间的指令的所有指令长度之和,如果存在则为最接近的中间采取分支。 分支解析器使用该地址跟踪逻辑来生成正在解析的分支指令的地址,以及以下顺序指令的地址。 发送到指令提取器的分支错误预测的恢复地址是实际不采用分支时的以下顺序地址,并且是实际采用分支时的目标地址。 分支可以在任何流水线阶段解决。
    • 4. 发明授权
    • Reduced-modulus address generation using sign-extension and correction
    • 使用符号扩展和校正的减少模数地址生成
    • US5511017A
    • 1996-04-23
    • US252579
    • 1994-06-01
    • Earl T. CohenJames S. Blomgren
    • Earl T. CohenJames S. Blomgren
    • G06F7/50G06F7/509G06F9/30G06F9/355G06F12/02G06F7/38
    • G06F7/509G06F12/0292G06F9/30101G06F9/30116G06F9/3013G06F9/355G06F9/3552G06F7/49931G06F7/49994
    • A mixed-modulo address generation unit has several inputs, preferably three. The unit can effectively add together a subset of these inputs in a reduced modulus, and simultaneously add this partial sum to a full-width input using a full modulus, the full modulus being greater than the reduced modulus. Reduced-width address components, such as 16-bit components with a 32-bit adder, are applied to the subset of inputs. The mixed modulo address generation unit sign-extends to 32-bits one input that includes a sign bit, the input being in the subset of inputs. Each input in the subset of inputs is applied to a carry-generate unit which signals if the partial sum is equal to or exceeds the reduced modulus. Under normal conditions, the full-modulus sum from the adder is output as a linear address. However, if the carry-generate unit signals a carry-out, and the sign bit indicates a positive number, then the full-modulus sum is recirculated to one of the adder's inputs and a correction term, equal to the two's complement of the reduced modulus, is added to produce the linear address. If the carry generate unit does not signal a carry-out, but the sign bit indicates a negative number, then the full-modulus sum is recirculated to one of the adder's inputs and a correction term, equal to the reduced modulus, is added to produce the linear address.
    • 混合模地址生成单元具有多个输入,最好是三个。 该单元可以以减小的模量有效地将这些输入的子集合在一起,并且同时使用全模量将该部分和添加到全宽输入,全模量大于降低的模量。 减少宽度的地址组件,如具有32位加法器的16位组件,将应用于输入子集。 混合模地址生成单元对32位一个输入进行扩展,包括一个符号位,该输入位于输入子集中。 输入子集中的每个输入被应用于进位生成单元,如果部分和等于或者超过了减小的模数,则该进位生成单元发出信号。 在正常条件下,来自加法器的全模数和作为线性地址输出。 然而,如果进位产生单元发送进位输出,并且符号位指示正数,则全模量和再循环到加法器的输入之一,并且修正项等于减少的二进制补码 模数,被添加以产生线性地址。 如果进位生成单元没有发送进位信号,但是符号位指示负数,则全模数和再循环到加法器的一个输入端,并将等于减小的模数的校正项加到 产生线性地址。
    • 5. 发明授权
    • RAM-like test structure superimposed over rows of macrocells with added
differential pass transistors in a CPU
    • 在CPU中添加有差分传输晶体管的宏单元行上叠加的类似RAM的测试结构
    • US5951702A
    • 1999-09-14
    • US832922
    • 1997-04-04
    • Hank LimEarl T. CohenPeter J. VigilJengwei PanJames S. Blomgren
    • Hank LimEarl T. CohenPeter J. VigilJengwei PanJames S. Blomgren
    • G11C29/04G11C29/00C11C7/00
    • G11C29/04
    • A test structure is added to a microprocessor. The test structure is a RAM-like array of scan-clock word lines which selects a row of macrocells to be read or written. Perpendicular to the scan-clock word lines and the rows of macrocells are scan-data bit lines. Each testable macrocell has true and complement signal nodes that are connected to a pair of scan-data bit lines through a pair of n-channel pass transistors. The gates of the pass transistors are controlled by the scan-clock word line. The true and complement signal nodes are the cross-coupled inverters or gates in a latch. The latch is written or loaded by driving opposite data values onto the pair of scan-data bit lines when the pass transistors are activated by the scan-clock word line. The macrocells have random widths and thus do not form regular columns, so the columns of scan-data bit lines must be expanded to accommodate the various macrocell widths. Non-storage macrocells such as logic gates and buffers can be read but not written using the pass transistors connected to true and complement nodes in the macrocell. Reading causes a small voltage difference to be generated on the scan-data bit lines which is sensed by a sense amplifier. Only two n-channel transistors are added to a macrocell to make the macrocell testable. Thus testing is added with minimal area, cost, and delay to the macrocell.
    • 将测试结构添加到微处理器。 测试结构是一个像RAM一样的扫描时钟字线阵列,它选择一行要读或写的宏单元。 垂直于扫描时钟字线和宏单元的行是扫描数据位线。 每个可测试的宏单元具有通过一对n沟道传输晶体管连接到一对扫描数据位线的真实和补码信号节点。 传输晶体管的栅极由扫描时钟字线控制。 真实和补码信号节点是锁存器中的交叉耦合的反相器或门。 当通过晶体管被扫描时钟字线激活时,通过将相反的数据值驱动到扫描数据位线对上来写入或加载锁存器。 宏单元具有随机宽度,因此不形成规则列,因此扫描数据位线的列必须被扩展以适应各种宏单元宽度。 可以读取非存储宏单元,例如逻辑门和缓冲器,但不能使用连接到宏单元中的真和补节点的传输晶体管来写入。 读取会在由读出放大器感测到的扫描数据位线上产生小的电压差。 只有两个n沟道晶体管被添加到宏单元以使宏单元可测试。 因此,对宏单元的面积,成本和延迟最小化。
    • 6. 发明授权
    • Merge/mask, rotate/shift, and boolean operations from two instruction
sets executed in a vectored mux on a dual-ALU
    • 在双ALU的矢量复用器中执行的两个指令集的合并/掩码,旋转/移位和布尔运算
    • US5781457A
    • 1998-07-14
    • US649116
    • 1996-05-14
    • Earl T. CohenJames S. BlomgrenDavid E. Richter
    • Earl T. CohenJames S. BlomgrenDavid E. Richter
    • G06F7/575G06F7/76G06F7/38
    • G06F7/764G06F7/575G06F7/762
    • A Boolean logic unit (BLU) features a vectored mux. Boolean instructions are executed by applying operands to the select inputs but truth-table signals to the data inputs. Merge and mask operations are performed by reversing the connection and inputting the operands to the data inputs but applying a merge mask to the select inputs. A byte-spreader copies byte or 16-bit operands to 32-bits before being rotated and merged by the vectored mux. A rotator is used to rotate an operand before being applied to the data input of the vectored mux so that compound rotate-merge operations can be executed in a single step through the vectored mux. A carry flag may also be merged in during a multi-step bit-test instruction. Complex CISC instructions such as rotate-through-carry and shift-double are executed in multiple steps on the vectored mux. Intermediate results are stored in the multiplier-quotient temporary registers which are normally used for multiply and divide instructions. A RISC ALU using the vectored mux BLU is modified only slightly to support execution of CISC instructions. Merge, mask, rotate, shift, and Boolean operations of both RISC and CISC instruction sets are executed in the same ALU because of the inherent flexibility of the vectored mux architecture.
    • 布尔逻辑单元(BLU)具有向量多路复用器。 布尔指令通过将操作数应用于选择输入而实际表信号到数据输入来执行。 通过反转连接并将操作数输入到数据输入,但将合并掩码应用于选择输入来执行合并和掩码操作。 字节扩展器将字节或16位操作数复制到32位,然后由矢量复用器旋转并合并。 旋转器用于在施加到向量多路复用器的数据输入之前旋转操作数,以便可以通过向量多路复用器在单个步骤中执行复合旋转合并操作。 进位标志也可以在多步位测试指令期间被合并。 复杂的CISC指令,例如旋转进位和移位双精度在多个步骤中被执行。 中间结果存储在通常用于乘法和除法指令的乘法器商临时寄存器中。 使用向量复用器BLU的RISC ALU仅稍微修改以支持执行CISC指令。 RISC和CISC指令集的合并,掩码,旋转,移位和布尔运算都由相同的ALU执行,因为矢量多路复用器架构具有固有的灵活性。
    • 7. 发明授权
    • Inexact leading-one/leading-zero prediction integrated with a
floating-point adder
    • 与浮点加法器集成的不精确的前导/前导零/预测
    • US5633819A
    • 1997-05-27
    • US547396
    • 1995-10-24
    • Cheryl S. BrashearsJames S. BlomgrenEarl T. Cohen
    • Cheryl S. BrashearsJames S. BlomgrenEarl T. Cohen
    • G06F5/01G06F7/50G06F7/74G06F7/38
    • G06F7/74G06F5/012G06F7/485G06F7/503
    • The sum from a floating point adder is normalized by an initial shift based on a prediction for the position of the leading one or zero in the sum. This leading-one/zero prediction is based not on the operands input to the adder, nor the result from the adder, but on the intermediate generate and propagate signals within the adder. The adder has a first stage that reduces each bit-position to a generate and a propagate signal. The adder's second stage propagates the carries in the adder using these generate and propagate signals to generate the sum. Thus the adder's first-stage logic is also used for the leading one/zero prediction, reducing cost and complexity. An ECL half-adder cell is preferably used for the adder's first stage. A zero output is added to the ECL half-adder cell at minimal cost. The shift for the leading one/zero prediction is accomplished in two stages, with a selective complement of negative sums between the two-stage shift. This allows more time for a more exact prediction after the first coarse shift. The final exact detection of the leading one is pipelined to detect the sum after the complementor but before the second stage of the shifter. This allows the final exact detection of the leading one to occur in parallel with the second stage of the shifter, reducing the delay for generating the final normalized sum by a final shifter.
    • 来自浮点加法器的和通过基于对前导1的位置的预测的初始偏移归一化,或者在和中归零。 这种前置/零预测不是基于输入到加法器的操作数,也不是来自加法器的结果,而是基于中间产生和传播加法器内的信号。 加法器具有将每个位位置减小到生成和传播信号的第一级。 加法器的第二级使用这些生成和传播信号在加法器中传播运算以产生和。 因此,加法器的第一级逻辑也用于领先的一/零预测,降低成本和复杂性。 ECL半加器单元优选地用于加法器的第一级。 以最小的成本将零输出添加到ECL半加器单元。 领先的一个/零预测的转移分两个阶段完成,两阶段之间的负和的选择性补充。 这允许在第一粗移位之后更多的时间进行更准确的预测。 前导的最终精确检测被流水线检测补码器之后但在移位器的第二级之前的和。 这允许最前端的精确检测与移位器的第二级并行发生,减少了由最终移位器产生最终归一化和的延迟。
    • 8. 发明授权
    • Master-slave cache system for instruction and data cache memories
    • 用于指令和数据高速缓冲存储器的主从缓存系统
    • US5551001A
    • 1996-08-27
    • US267658
    • 1994-06-29
    • Earl T. CohenRussell W. TillemanJay C. PattinJames S. Blomgren
    • Earl T. CohenRussell W. TillemanJay C. PattinJames S. Blomgren
    • G06F12/08
    • G06F12/0897G06F12/0848G06F12/0857G06F12/0831
    • A master-slave cache system has a large, set-associative master cache, and two smaller direct-mapped slave caches, a slave instruction cache for supplying instructions to an instruction pipeline of a processor, and a slave data cache for supplying data operands to an execution pipeline of the processor. The master cache and the slave caches are tightly coupled to each other. This tight coupling allows the master cache to perform most cache management operations for the slave caches, freeing the slave caches to supply a high bandwidth of instructions and operands to the processor's pipelines. The master cache contains tags that include valid bits for each slave, allowing the master cache to determine if a line is present and valid in either of the slave caches without interrupting the slave caches. The master cache performs all search operations required by external snooping, cache invalidation, cache data zeroing instructions, and store-to-instruction-stream detection. The master cache interrupts the slave caches only when the search reveals that a line is valid in a slave cache, the master cache causing the slave cache to invalidate the line. A store queue is shared between the master cache and the slave data cache. Store data is written from the store queue directly in to both the slave data cache and the master cache, eliminating the need for the slave data cache to write data through to the master cache. The master-slave cache system also eliminates the need for a second set of address tags for snooping and coherency operations. The master cache can be large and designed for a low miss rate, while the slave caches are designed for the high speed required by the processor's pipelines.
    • 主从缓存系统具有大的集合关联主缓存和两个较小的直接映射从高速缓存,用于向处理器的指令流水线提供指令的从指令高速缓存器和用于向数据操作数提供数据操作数的从数据高速缓存 处理器的执行流水线。 主缓存和从属高速缓存彼此紧密耦合。 这种紧密耦合允许主缓存对从属高速缓存执行大多数缓存管理操作,释放从高速缓存以向处理器的管道提供高带宽的指令和操作数。 主缓存包含包含每个从站的有效位的标签,允许主缓存在两个从属高速缓存中确定一条线是否存在并且有效,而不会中断从高速缓存。 主缓存执行外部侦听,缓存无效,缓存数据归零指令和存储到指令流检测所需的所有搜索操作。 主缓存仅当搜索显示从属缓存中的行有效时才会中断从属高速缓存,主缓存导致从高速缓存使该线无效。 存储队列在主缓存和从属数据高速缓存之间共享。 存储数据从存储队列中直接写入到从属数据高速缓存和主缓存中,无需从属数据高速缓存将数据写入主缓存。 主从缓存系统还消除了对于窥探和一致性操作的第二组地址标签的需要。 主缓存可能很大,设计用于低错误率,而从属缓存设计为处理器管道所需的高速。
    • 9. 发明授权
    • Emulation of segment bounds checking using paging with sub-page validity
    • 使用具有子页有效性的分页来对段边界进行仿真
    • US5440710A
    • 1995-08-08
    • US207857
    • 1994-03-08
    • David E. RichterEarl T. CohenJames S. Blomgren
    • David E. RichterEarl T. CohenJames S. Blomgren
    • G06F11/36G06F12/10
    • G06F11/3648G06F12/1027G06F12/1036G06F12/109
    • Segmentation is added to a reduced instruction set computer (RISC) processor which supports paging. The arithmetic-logic-unit (ALU) is extended to allow for a 3-port addition so that the segment base can be added when the virtual address is being generated. Segment bounds checking is achieved by extending the paging system to allow for valid regions that are less than the full page size. Sub-page validity can mimic segmentation because a segment can be broken up into a number of full pages and one or more partially-valid pages at the segment boundaries. A page that is not wholly valid has an "event" on the page, and a memory reference to this page will either cause a software routine to be invoked to check the segment bound, or an extension to the TLB, called a sub-page validity buffer, is used to check if the reference was to a valid portion of the page. Events may also be defined for program watchpoints and defective memory locations. Segment bounds thus do not have to be compared for each access, and the bounds do not even have to be stored on the CPU die.
    • 分段被添加到支持寻呼的精简指令集计算机(RISC)处理器。 扩展算术逻辑单元(ALU)以允许3端口添加,以便在生成虚拟地址时添加段基础。 通过扩展分页系统来实现分段边界检查,以允许小于整页大小的有效区域。 子页面有效性可以模拟分割,因为片段可以分割成多个完整页面和一个或多个部分有效页面在段边界。 不完全有效的页面在页面上具有“事件”,并且对该页面的存储器引用将导致调用软件例程来检查段绑定,或者称为子页面的TLB的扩展 有效性缓冲区,用于检查引用是否是页面的有效部分。 事件也可以定义为程序观察点和缺陷记忆位置。 因此,对于每个访问,段边界不必进行比较,并且边界甚至不必被存储在CPU管芯上。
    • 10. 发明授权
    • Method and apparatus for address transfers, system serialization, and centralized cache and transaction control, in a symetric multiprocessor system
    • 在对称多处理器系统中的地址传输,系统序列化和集中式缓存和事务控制的方法和装置
    • US06466825B1
    • 2002-10-15
    • US09927717
    • 2001-08-10
    • Yuanlong WangZong YuXiaofan WeiEarl T. CohenBrian R. BairdDaniel Fu
    • Yuanlong WangZong YuXiaofan WeiEarl T. CohenBrian R. BairdDaniel Fu
    • G05B1918
    • G06F12/0822
    • A preferred embodiment of a symmetric multiprocessor system includes a switched fabric (switch matrix) for data transfers that provides multiple concurrent buses that enable greatly increased bandwidth between processors and shared memory. A Transaction Controller, Transaction Bus, and Transaction Status Bus are used for serialization, centralized cache control, and highly pipelined address transfers. The shared Transaction Controller serializes transaction requests from Initiator devices that can include CPU/Cache modules and Peripheral Bus modules. The Transaction Bus of an illustrative embodiment is implemented using segmented buses, distributed muxes, point-to-point wiring, and supports transaction processing at a rate of one transaction per clock cycle. The Transaction Controller monitors the Transaction Bus, maintains a set of duplicate cache-tags for all CPU/Cache modules, maps addresses to Target devices, performs centralized cache control for all CPU/Cache modules, filters unnecessary Cache transactions, and routes necessary transactions to Target devices over the Transaction Status Bus. The Transaction Status Bus includes both bus-based based and point-to-point control of the target devices. A modified rotating priority scheme is used to provide Starvation-free support for Locked buses and memory resources via backoff operations. Speculative memory operations are supported to further enhance performance.
    • 对称多处理器系统的优选实施例包括用于数据传输的交换结构(交换矩阵),其提供多个并行总线,其能够在处理器和共享存储器之间大大增加带宽。 交易控制器,事务总线和事务状态总线用于串行化,集中式缓存控制和高流水线地址传输。 共享的交易控制器将来自启动器设备的事务请求序列化,其中可能包括CPU /缓存模块和外设总线模块。 说明性实施例的事务总线使用分段总线,分布式多路复用器,点对点布线来实现,并且以每个时钟周期的一个事务的速率支持事务处理。 事务控制器监视事务总线,为所有CPU / Cache模块维护一组重复的缓存标签,将地址映射到目标设备,对所有CPU /缓存模块执行集中式缓存控制,过滤不必要的缓存事务,并将必要的事务路由到 在事务状态总线上的目标设备。 事务状态总线包括目标设备的基于总线的和点到点的控制。 改进的旋转优先级方案用于通过退避操作为锁定总线和存储器资源提供无饥饿的支持。 支持推测性内存操作,以进一步提高性能。