会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 21. 发明授权
    • Method and apparatus for modulo scheduled loop execution in a processor architecture
    • 在处理器架构中用于模数调度循环执行的方法和装置
    • US07725696B1
    • 2010-05-25
    • US11867127
    • 2007-10-04
    • Wen-mei W. HwuMatthew C. Merten
    • Wen-mei W. HwuMatthew C. Merten
    • G06F9/00
    • G06F8/4452G06F9/325G06F9/381G06F9/3836G06F9/3857G06F9/3861
    • A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.
    • 一种处理器方法和装置,其允许重复执行循环的多次迭代,同时允许编译器在代码中仅包括循环体的单个副本,同时自动管理哪些迭代是活动的。 由于在本发明的硬件内隐含地创建和维护序言和结尾语言,与仅软件模数调度相比,可以实现代码大小的显着降低。 此外,迭代计数小于内核中存在的并发迭代次数的循环也会自动处理。 该硬件增强方案实现与完全指定的标准方法相同的性能。 此外,硬件可以减少功率需求,因为整个提取单元可以在循环执行的一部分中停用。 本发明的基本设计涉及在处理器的调度阶段包括多个用于存储循环指令的缓冲器,每个循环指令与指令解码器及其各自的功能单元相关联。 控制逻辑用于接收循环设置参数并控制从缓冲器到功能单元的指令的选择性发布。
    • 22. 发明授权
    • Method and apparatus for modulo scheduled loop execution in a processor architecture
    • 在处理器架构中用于模数调度循环执行的方法和装置
    • US07302557B1
    • 2007-11-27
    • US09728441
    • 2000-12-01
    • Wen-mei W. HwuMatthew C. Merten
    • Wen-mei W. HwuMatthew C. Merten
    • G06F9/30G06F9/45
    • G06F8/4452G06F9/325G06F9/381G06F9/3836G06F9/3857G06F9/3861
    • A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution. The basic design of the invention involves including a plurality of buffers for storing loop instructions, each of which is associated with an instruction decoder and its respective functional unit, in the dispatch stage of a processor. Control logic is used to receive loop setup parameters and to control the selective issue of instructions from the buffers to the functional units.
    • 一种处理器方法和装置,其允许重复执行循环的多次迭代,同时允许编译器在代码中仅包括循环体的单个副本,同时自动管理哪些迭代是活动的。 由于在本发明的硬件内隐含地创建和维护序言和结尾语言,与仅软件模数调度相比,可以实现代码大小的显着降低。 此外,迭代计数小于内核中存在的并发迭代次数的循环也会自动处理。 该硬件增强方案实现与完全指定的标准方法相同的性能。 此外,硬件可以减少功率需求,因为整个提取单元可以在循环执行的一部分中停用。 本发明的基本设计涉及在处理器的调度阶段包括多个用于存储循环指令的缓冲器,每个循环指令与指令解码器及其各自的功能单元相关联。 控制逻辑用于接收循环设置参数并控制从缓冲器到功能单元的指令的选择性发布。
    • 29. 发明申请
    • MINIMIZING BANDWIDTH TO TRACK RETURN TARGETS BY AN INSTRUCTION TRACING SYSTEM
    • 通过指令跟踪系统最小化带宽跟踪返回目标
    • US20140337604A1
    • 2014-11-13
    • US13890654
    • 2013-05-09
    • Beeman C. StrongMatthew C. MertenTong Li
    • Beeman C. StrongMatthew C. MertenTong Li
    • G06F9/30
    • G06F9/30145G06F9/3806G06F9/3857G06F11/3476G06F11/3636
    • A processing device implementing minimizing bandwidth to track return targets by an instruction tracing system is disclosed. A processing device of the disclosure an instruction fetch unit comprising a return stack buffer (RSB) to predict a target address of a return (RET) instruction corresponding to a call (CALL) instruction. The processing device further includes a retirement unit comprising an instruction tracing module to initiate instruction tracing for instructions executed by the processing device, determine whether the target address of the RET instruction was mispredicted, determine a value of call depth counter (CDC) maintained by the instruction tracing module, and when the target address of the RET instruction was not mispredicted and when the value of the CDC is greater than zero, generate an indication that the RET instruction branches to a next linear instruction after the corresponding CALL instruction.
    • 公开了一种通过指令跟踪系统实现最小化带宽以跟踪返回目标的处理设备。 本公开的处理装置包括一个指令提取单元,该单元包括用于预测与一个调用(CALL)指令相对应的返回(RET)指令的目标地址的返回栈缓冲器(RSB)。 所述处理装置还包括退出单元,所述退出单元包括指令跟踪模块,用于启动由所述处理设备执行的指令的指令跟踪,确定所述RET指令的目标地址是否被错误预测,确定由所述处理设备维护的所述呼叫深度计数器 指令跟踪模块,并且当RET指令的目标地址未被错误预测时,并且当CDC的值大于零时,生成指令在相应的CALL指令之后分支到下一个线性指令。