会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Distributed cache coherence at scalable requestor filter pipes that accumulate invalidation acknowledgements from other requestor filter pipes using ordering messages from central snoop tag
    • 可扩展请求者过滤管道上的分布式高速缓存一致性,使用来自中央监听标签的排序消息从其他请求者过滤器管道累积无效确认
    • US07366847B2
    • 2008-04-29
    • US11307413
    • 2006-02-06
    • David A. KruckemyerKevin B. NormoyleRobert G. Hathaway
    • David A. KruckemyerKevin B. NormoyleRobert G. Hathaway
    • G06F12/00
    • G06F12/082G06F12/0828
    • A multi-processor, multi-cache system has filter pipes that store entries for request messages sent to a central coherency controller. The central coherency controller orders requests from filter pipes using coherency rules but does not track completion of invalidations. The central coherency controller reads snoop tags to identify sharing caches having a copy of a requested cache line. The central coherency controller sends an ordering message to the requesting filter pipe. The ordering message has an invalidate count indicating the number of sharing caches. Each sharing cache receives an invalidation message from the central coherency controller, invalidates its copy of the cache line, and sends an invalidation acknowledgement message to the requesting filter pipe. The requesting filter pipe decrements the invalidate count until all sharing caches have acknowledged invalidation. All ordering, data, and invalidation acknowledgement messages must be received by the requesting filter pipe before loading the data into its cache.
    • 多处理器,多缓存系统具有过滤器管道,其存储发送到中央一致性控制器的请求消息的条目。 中央一致性控制器使用一致性规则对来自过滤器管道的请求进行排序,但不跟踪完成无效。 中央一致性控制器读取窥探标签以识别具有所请求的高速缓存行的副本的共享高速缓存。 中央一致性控制器向请求过滤管发送排序消息。 排序消息具有指示共享缓存数量的无效计数。 每个共享缓存从中央一致性控制器接收到无效消息,使其高速缓存行的副本无效,并向请求的过滤器管道发送无效确认消息。 请求过滤管道减少无效计数,直到所有共享缓存都确认无效。 在将数据加载到其缓存中之前,请求过滤器管道必须接收所有排序,数据和无效确认消息。
    • 2. 发明授权
    • Write-back cache with different ECC codings for clean and dirty lines with refetching of uncorrectable clean lines
    • 使用不同的ECC编码进行回写缓存,用于清除和脏线,并重新绘制不可校正的干净线条
    • US07437597B1
    • 2008-10-14
    • US10908586
    • 2005-05-18
    • David A. KruckemyerKevin B. NormoyleJack H. Choquette
    • David A. KruckemyerKevin B. NormoyleJack H. Choquette
    • G06F11/10
    • G06F11/1064
    • A write-back cache has error-correction code (ECC) fields storing ECC bits for cache lines. Clean cache lines are re-fetched from memory when an ECC error is detected. Dirty cache lines are corrected using the ECC bits or signal an uncorrectable error. The type of ECC code stored is different for clean and dirty lines. Clean lines use an error-detection code that can detect longer multi-bit errors than the error correction code used by dirty lines. Dirty lines use a correction code that can correct a bit error in the dirty line, while the detection code for clean lines may not be able to correct any errors. Dirty lines' ECC is optimized for correction while clean lines' ECC is optimized for detection. A single-error-correction, double-error-detection (SECDED) code may be used for dirty lines while a triple-error-detection code is used for clean lines.
    • 回写高速缓存具有存储用于高速缓存行的ECC位的纠错码(ECC)字段。 当检测到ECC错误时,清除缓存行将从内存重新获取。 使用ECC位校正脏高速缓存行,或者发出不可纠正的错误信号。 存储的ECC代码的类型对于干净和脏线是不同的。 清洁线路使用的错误检测代码可以检测到比脏线使用的纠错码更长的多位错误。 脏线使用可以纠正脏线中的位错误的校正代码,而干线的检测代码可能无法更正任何错误。 脏线ECC被优化用于校正,而清洁线的ECC被优化用于检测。 单错误纠正,双错误检测(SECDED)代码可用于脏线,而三重检错码用于干线。
    • 3. 发明授权
    • Duplicate snoop tags partitioned across multiple processor/cache chips in a multi-processor system
    • 在多处理器系统中跨多个处理器/缓存芯片分割的重复的窥探标签
    • US07225300B1
    • 2007-05-29
    • US10711387
    • 2004-09-15
    • Jack H. ChoquetteDavid A. KruckemyerRobert G. Hathaway
    • Jack H. ChoquetteDavid A. KruckemyerRobert G. Hathaway
    • G06F13/00G06F12/00
    • G06F12/0831
    • Several cluster chips and a shared main memory are connected by interconnect buses. Each cluster chip has multiple processors using multiple level-2 local caches, two memory controllers and two snoop tag partitions. The interconnect buses connect all local caches to all snoop tag partitions on all cluster chips. Each snoop tag partition has all the system's snoop tags for a partition of the main memory space. The snoop index is a subset of the cache index, with remaining chip-select and interleave address bits selecting which of the snoop tag partitions on the multiple cluster chips stores snoop tags for that address. The number of snoop entries in a snoop set is equal to a total number of cache entries in one cache index for all local caches on all cluster chips. Cache coherency request processing is distributed among the snoop tag partitions on different cluster chips, reducing bottlenecks.
    • 几个集群芯片和共享主存储器通过互连总线连接。 每个集群芯片都有多个处理器,使用多个二级本地缓存,两个内存控制器和两个监听标签分区。 互连总线将所有本地缓存​​连接到所有集群芯片上的所有侦听标签分区。 每个snoop标签分区具有用于主内存空间分区的所有系统的snoop标签。 侦听索引是高速缓存索引的一个子集,其余的芯片选择和交织地址位选择多个集群芯片上的哪个snoop标签分区存储该地址的snoop标签。 snoop集中的snoop条目的数量等于所有集群芯片上所有本地缓存​​的一个缓存索引中的高速缓存条目的总数。 缓存一致性请求处理分布在不同集群芯片上的snoop标签分区之间,从而减少了瓶颈。
    • 6. 发明授权
    • Multi-level store merging in a cache and memory hierarchy
    • 多级存储合并在缓存和内存层次结构中
    • US09280479B1
    • 2016-03-08
    • US13478100
    • 2012-05-22
    • David A. KruckemyerJohn Gregory FavorMatthew W. Ashcraft
    • David A. KruckemyerJohn Gregory FavorMatthew W. Ashcraft
    • G06F12/08
    • G06F12/0871G06F9/3824G06F12/0868G06F12/0897G06F2212/1024
    • A memory system having increased throughput is disclosed. Specifically, the memory system includes a first level write combining queue that reduces the number of data transfers between a level one cache and a level two cache. In addition, a second level write merging buffer can further reduce the number of data transfers within the memory system. The first level write combining queue receives data from the level one cache. The second level write merging buffer receives data from the first level write combining queue. The level two cache receives data from both the first level write combining queue and the second level write merging buffer. Specifically, the first level write combining queue combines multiple store transactions from the load store units to associated addresses. In addition, the second level write merging buffer merges data from the first level write combining queue.
    • 公开了一种具有增加的吞吐量的存储器系统。 具体来说,存储器系统包括一级写入组合队列,其减少一级缓存和二级缓存之间的数据传输次数。 此外,第二级写入合并缓冲器可以进一步减少存储器系统内的数据传输的数量。 第一级写入组合队列从一级缓存接收数据。 第二级写合并缓冲区从第一级写入组合队列接收数据。 二级缓存从第一级写入组合队列和第二级写入合并缓冲区接收数据。 具体来说,第一级写入组合队列将来自加载存储单元的多个存储事务组合到相关联的地址。 另外,第二级写入合并缓冲区合并来自第一级写入组合队列的数据。
    • 7. 发明授权
    • Outstanding load miss buffer with shared entries
    • 具有共享条目的突出负载丢失缓冲区
    • US08850121B1
    • 2014-09-30
    • US13250544
    • 2011-09-30
    • Matthew W. AshcraftJohn Gregory FavorDavid A. Kruckemyer
    • Matthew W. AshcraftJohn Gregory FavorDavid A. Kruckemyer
    • G06F12/08
    • G06F9/3838G06F9/30043G06F9/34
    • A load/store unit with an outstanding load miss buffer and a load miss result buffer is configured to read data from a memory system having a level one cache. Missed load instructions are stored in the outstanding load miss buffer. The load/store unit retrieves data for multiple dependent missed load instructions using a single cache access and stores the data in the load miss result buffer. The outstanding load miss buffer stores a first missed load instruction in a first primary entry. Additional missed load instructions that are dependent on the first missed load instructions are stored in dependent entries of the first primary entry or in shared entries. If a shared entry is used for a missed load instruction the shared entry is associated with the primary entry.
    • 具有未完成的负载未命中缓冲器和加载未命中结果缓冲器的加载/存储单元被配置为从具有一级缓存的存储器系统读取数据。 丢失的加载指令存储在未完成的负载丢失缓冲器中。 加载/存储单元使用单个高速缓存访​​问来检索多个相关的错过加载指令的数据,并将数据存储在加载未结果缓冲器中。 未完成的负载未命中缓冲存储器将第一个缺省加载指令存储在第一个初级条目中。 依赖于第一个错过的加载指令的附加的错过加载指令被存储在第一主入口的相关条目或共享条目中。 如果共享条目用于错过加载指令,则共享条目与主条目相关联。
    • 8. 发明授权
    • Clock gating of sub-circuits within a processor execution unit responsive to instruction latency counter within processor issue circuit
    • 处理器执行单元内的子电路的时钟选通响应处理器发行电路内的指令延迟计数器
    • US06971038B2
    • 2005-11-29
    • US10061695
    • 2002-02-01
    • Sribalan SanthanamVincent R. von KaenelDavid A. Kruckemyer
    • Sribalan SanthanamVincent R. von KaenelDavid A. Kruckemyer
    • G06F1/08G06F1/10G06F1/32
    • G06F1/08G06F1/10
    • A processor may include an execution circuit, an issue circuit coupled to the execution circuit, and a clock tree for clocking circuitry in the processor. The issue circuit issues an instruction to the execution circuit, and generates a control signal responsive to whether or not the instruction is issued to the execution circuit. The execution circuit includes at least a first subcircuit and a second subcircuit. A portion of the clock tree supplies a plurality of clocks to the execution circuit, including at least a first clock clocking the first subcircuit and at least a second clock clocking the second subcircuit. The portion of the clock tree is coupled to receive the control signal for collectively conditionally gating the plurality of clock, and is also configured to individually conditionally gate at least some of the plurality of clocks responsive to activity in the respective subcircuits of the execution circuit. A system on a chip may include several processors, and one or more of the processors may be conditionally clocked at the processor level.
    • 处理器可以包括执行电路,耦合到执行电路的发行电路,以及用于在处理器中计时电路的时钟树。 发行电路向执行电路发出指令,并且响应于是否向执行电路发出指令而生成控制信号。 执行电路至少包括第一子电路和第二子电路。 时钟树的一部分向执行电路提供多个时钟,包括至少第一时钟计时第一分支电路和至少第二时钟计时第二分支电路。 时钟树的部分被耦合以接收控制信号,用于共同有条件地选通多个时钟,并且还被配置为响应于执行电路的相应子电路中的活动而单独有条件地选择多个时钟中的至少一些。 芯片上的系统可以包括几个处理器,并且一个或多个处理器可以在处理器级别有条件地定时。
    • 9. 发明授权
    • Method for cancelling speculative conditional delay slot instructions
    • 用于取消推测条件延迟时隙指令的方法
    • US07296141B2
    • 2007-11-13
    • US10920766
    • 2004-08-18
    • David A. Kruckemyer
    • David A. Kruckemyer
    • G06F9/00
    • G06F9/3842G06F9/3844G06F9/3859
    • A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. The second tag may equal the first tag if the branch delay slot is unconditional for that branch, and may equal a different tag if the branch delay slot is conditional for the branch. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages. If the tag in a pipeline stage matches the first tag, the instruction is not cancelled. If the tag mismatches, the instruction is cancelled.
    • 第一个标签被分配给分支指令。 根据分支指令的类型,将第二标签分配给分支指令的分支延迟时隙中的指令。 如果分支延迟时隙对于该分支是无条件的,则第二标签可以等于第一标签,并且如果分支延迟时隙对于分支是有条件的,则可以等于不同的标签。 如果分支被错误预测,则将第一标签广播到可能具有推测性指令的流水线阶段,并且将第一标签与流水线阶段中的标签进行比较。 如果流水线阶段的标签与第一个标签匹配,则该指令不会被取消。 如果标签不匹配,则说明被取消。
    • 10. 发明授权
    • Method for identifying basic blocks with conditional delay slot instructions
    • 用条件延迟槽指令识别基本块的方法
    • US07219216B2
    • 2007-05-15
    • US11046439
    • 2005-01-28
    • David A. Kruckemyer
    • David A. Kruckemyer
    • G09F9/00
    • G06F9/3842
    • A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages to determine which instructions to cancel. The assignment of tags for a fetch group of concurrently fetched instructions may be performed in parallel. A plurality of branch sequence numbers may be generated, and one of the plurality may be selected for each instruction responsive to the cumulative number of branch instructions preceding that instruction within the fetch group. The selection may be further responsive to whether or not the instruction is in a conditional delay slot.
    • 第一个标签被分配给分支指令。 根据分支指令的类型,将第二标签分配给分支指令的分支延迟时隙中的指令。 如果分支被错误预测,则将第一标签广播到可能具有推测性指令的流水线阶段,并且将第一标签与流水线阶段中的标签进行比较以确定要取消的指令。 可以并行执行并行获取的指令的取出组的标签分配。 可以生成多个分支序列号,并且响应于在取出组内的该指令之前的分支指令的累积数量,可以为每个指令选择多个分支序列号中的一个。 该选择可以进一步响应该指令是否处于条件延迟时隙。