会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Dynamic control of SIMDs
    • SIMD的动态控制
    • US09311102B2
    • 2016-04-12
    • US13180721
    • 2011-07-12
    • Tushar K. ShahMichael J. MantorBrian Emberling
    • Tushar K. ShahMichael J. MantorBrian Emberling
    • G06F9/38G06F1/32G06F9/30G06T15/00G06F9/50
    • G06F9/3867G06F1/3203G06F1/3237G06F9/30101G06F9/3842G06F9/5094G06T15/005Y02D10/128Y02D10/22
    • Systems and methods to improve performance in a graphics processing unit are described herein. Embodiments achieve power saving in a graphics processing unit by dynamically activating/deactivating individual SIMDs in a shader complex that comprises multiple SIMD units. On-the-fly dynamic disabling and enabling of individual SIMDs provides flexibility in achieving a required performance and power level for a given processing application. Embodiments of the invention also achieve dynamic medium grain clock gating of SIMDs in a shader complex. Embodiments reduce switching power by shutting down clock trees to unused logic by providing a clock on demand mechanism. In this way, embodiments enhance clock gating to save more switching power for the duration of time when SIMDs are idle (or assigned no work). Embodiments can also save leakage power by power gating SIMDs for a duration when SIMDs are idle for an extended period of time.
    • 本文描述了用于改善图形处理单元中的性能的系统和方法。 实施例通过在包括多个SIMD单元的着色器复合体中动态地激活/去激活各个SIMD来实现图形处理单元中的功率节省。 实时动态禁用和启用单个SIMD可以灵活地实现给定处理应用程序所需的性能和功能级别。 本发明的实施例还实现了着色器复合体中SIMD的动态中粒时钟门控。 实施例通过提供时钟按需机制来将时钟树关闭到未使用的逻辑来降低开关功率。 以这种方式,实施例增强时钟选通以在SIMD空闲(或不分配工作)的持续时间内节省更多的开关功率。 实施例还可以通过SIMD在长时间空闲的持续时间内通过电源门控SIMD来节省泄漏功率。
    • 2. 发明授权
    • Scalable and unified compute system
    • 可扩展和统一的计算系统
    • US08558836B2
    • 2013-10-15
    • US12476161
    • 2009-06-01
    • Michael J. MantorJeffrey T. BradyMark C. FowlerMarcos P. Zini
    • Michael J. MantorJeffrey T. BradyMark C. FowlerMarcos P. Zini
    • G06T15/50
    • G06T1/20G06T1/60G09G5/363G09G2360/06
    • A Scalable and Unified Compute System performs scalable, repairable general purpose and graphics shading operations, memory load/store operations and texture filtering. A Scalable and Unified Compute. Unit Module comprises a shader pipe array, a texture mapping unit, and a level one texture cache system. It accepts ALU instructions, input/output instructions, and texture or memory requests for a specified set of pixels, vertices, primitives, surfaces, or general compute work items from a shader program and performs associated operations to compute the programmed output data. The texture mapping unit accepts source data addresses and instruction constants in order to fetch, format, and perform instructed filtering interpolations to generate formatted results based on the specific corresponding data stored in a level one texture cache system. The texture mapping unit consists of an address generating system, a pre-formatter module, interpolator module, accumulator module and a format module.
    • 可扩展和统一的计算系统执行可扩展,可修复的通用和图形着色操作,存储器加载/存储操作和纹理过滤。 可扩展和统一的计算。 单元模块包括着色器管阵列,纹理映射单元和一级纹理缓存系统。 它接受来自着色器程序的指定像素集,顶点,基元,曲面或一般计算工作项的ALU指令,输入/输出指令和纹理或存储器请求,并执行相关操作以计算编程的输出数据。 纹理映射单元接受源数据地址和指令常数,以便获取,格式化和执行指示的过滤内插,以基于存储在一级纹理缓存系统中的特定对应数据生成格式化的结果。 纹理映射单元由地址生成系统,预格式化模块,插值器模块,累加器模块和格式模块组成。
    • 3. 发明授权
    • Distributed clock gating with centralized state machine control
    • 分布式时钟门控与集中式状态机控制
    • US08316252B2
    • 2012-11-20
    • US12192530
    • 2008-08-15
    • Michael J. MantorTushar K. ShahDonald P. Lee
    • Michael J. MantorTushar K. ShahDonald P. Lee
    • G06F1/00H04L7/00
    • G06F1/3203G06F1/10G06F1/324Y02D10/126
    • A method, computer program product, and system are provided for controlling a clock distribution network. For example, an embodiment of the method can include programming a predetermined delay time into a plurality of processing elements and controlling an activation and de-activation of these processing elements in a sequence based on the predetermined delay time. The processing elements are located in a system incorporating the clock distribution network, where the predetermined delay time can be programmed in a control register of a clock gate control circuit residing in the processing element. Further, when controlling the activation and de-activation of the processing elements, this activity can be controlled with a state machine based on the system's mode of operation. In controlling the activation and de-activation of the processing elements, the method described above can not only control the effects of di/dt in the system but also shut off clock signals in the clock distribution network when idle, thus reducing dynamic power consumption.
    • 提供了一种用于控制时钟分配网络的方法,计算机程序产品和系统。 例如,该方法的实施例可以包括将预定的延迟时间编程到多个处理元件中,并且基于预定的延迟时间来控制这些处理元件在一个序列中的激活和去激活。 处理元件位于包含时钟分配网络的系统中,其中预定的延迟时间可以被编程在驻留在处理元件中的时钟门控制电路的控制寄存器中。 此外,当控制处理元件的激活和去激活时,可以使用状态机基于系统的操作模式来控制该活动。 在控制处理元件的激活和去激活时,上述方法不仅可以控制系统中di / dt的影响,还可以在空闲时关闭时钟分配网络中的时钟信号,从而降低动态功耗。
    • 8. 发明申请
    • Distributed Clock Gating with Centralized State Machine Control
    • 分布式时钟门控与集中式机器控制
    • US20090300388A1
    • 2009-12-03
    • US12192530
    • 2008-08-15
    • Michael J. MANTORTushar K. SHAHDonald P. LEE
    • Michael J. MANTORTushar K. SHAHDonald P. LEE
    • G06F1/32G06F1/08
    • G06F1/3203G06F1/10G06F1/324Y02D10/126
    • A method, computer program product, and system are provided for controlling a clock distribution network. For example, an embodiment of the method can include programming a predetermined delay time into a plurality of processing elements and controlling an activation and de-activation of these processing elements in a sequence based on the predetermined delay time. The processing elements are located in a system incorporating the clock distribution network, where the predetermined delay time can be programmed in a control register of a clock gate control circuit residing in the processing element. Further, when controlling the activation and de-activation of the processing elements, this activity can be controlled with a state machine based on the system's mode of operation. In controlling the activation and de-activation of the processing elements, the method described above can not only control the effects of di/dt in the system but also shut off clock signals in the clock distribution network when idle, thus reducing dynamic power consumption.
    • 提供了一种用于控制时钟分配网络的方法,计算机程序产品和系统。 例如,该方法的实施例可以包括将预定的延迟时间编程到多个处理元件中,并且基于预定的延迟时间来控制这些处理元件在一个序列中的激活和去激活。 处理元件位于包含时钟分配网络的系统中,其中预定的延迟时间可以被编程在驻留在处理元件中的时钟门控制电路的控制寄存器中。 此外,当控制处理元件的激活和去激活时,可以使用状态机基于系统的操作模式来控制该活动。 在控制处理元件的激活和去激活时,上述方法不仅可以控制系统中di / dt的影响,还可以在空闲时关闭时钟分配网络中的时钟信号,从而降低动态功耗。
    • 9. 发明授权
    • Method and apparatus for executing a predefined instruction set
    • 用于执行预定义指令集的方法和装置
    • US06784888B2
    • 2004-08-31
    • US09969669
    • 2001-10-03
    • Ralph C. TaylorMichael A. MangMichael J. Mantor
    • Ralph C. TaylorMichael A. MangMichael J. Mantor
    • G06T1500
    • G06F9/3001G06F9/3017G06T15/005
    • The occurrence of an (n+m) input operand instruction that requires more than n of its input operands from an n-output data source is recognized by a programmable vertex shader (PVS) controller. In turn, the PVS controller provides at least two substitute instructions, neither of which requires more than n operands from the n output data source, to a PVS engine. A first of the substitute instructions is executed by the PVS engine to provide an intermediate result that is temporarily stored and used as an input to another of the at least two substitute instructions. In this manner, the present invention avoids the expense of additional or significantly modified memory. In one embodiment of the present invention, a pre-accumulator register internal to the PVS engine is used to store the intermediate result. In this manner, the present invention provides a relatively inexpensive solution for a relatively infrequent occurrence.
    • 可编程顶点着色器(PVS)控制器识别需要从n输出数据源输入操作数大于n的(n + m)个输入操作数指令。 反过来,PVS控制器提供至少两个替代指令,这两个指令都不需要n个输出数据源的n个操作数到PVS引擎。 替代指令中的第一个由PVS引擎执行,以提供临时存储的中间结果,并将其用作至少两个替代指令中的另一个的输入。 以这种方式,本发明避免了附加或显着修改的存储器的费用。 在本发明的一个实施例中,PVS引擎内部的预累加器寄存器用于存储中间结果。 以这种方式,本发明提供了相对不频繁发生的相对便宜的解决方案。