会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 21. 发明授权
    • Graphics processing unit with shared arithmetic logic unit
    • 具有共享算术逻辑单元的图形处理单元
    • US08009172B2
    • 2011-08-30
    • US11550344
    • 2006-10-17
    • Guofang JiaoBrian RuttenbergChun YuYun Du
    • Guofang JiaoBrian RuttenbergChun YuYun Du
    • G06T1/20
    • G06T15/005
    • This disclosure describes a graphics processing unit (GPU) pipeline that uses one or more shared arithmetic logic units (ALUs). In order to facilitate such sharing of ALUs, the stages of the disclosed GPU pipeline may be rearranged relative to conventional GPU pipelines. In addition, by rearranging the stages of the GPU pipeline, efficiencies may be achieved in the image processing. Unlike conventional GPU pipelines, for example, an attribute gradient setup stage can be located much later in the pipeline, and the attribute interpolator stage may immediately follow the attribute gradient setup stage. This allows sharing of an ALU by the attribute gradient setup and attribute interpolator stages. Several other techniques and features for the GPU pipeline are also described, which may improve performance and possibly achieve additional processing efficiencies.
    • 本公开描述了使用一个或多个共享算术逻辑单元(ALU)的图形处理单元(GPU)流水线。 为了促进ALU的这种共享,所公开的GPU流水线的阶段可以相对于传统的GPU管线重新排列。 此外,通过重新排列GPU流水线的各个阶段,可以在图像处理中实现效率。 与传统GPU流水线不同,例如,属性梯度建立阶段可以在流水线后面定位,属性内插器阶段可以立即跟随属性梯度建立阶段。 这允许通过属性渐变设置和属性内插器阶段共享ALU。 还描述了用于GPU流水线的若干其它技术和特征,这可以提高性能并可能实现额外的处理效率。
    • 22. 发明申请
    • Graphics Processors With Parallel Scheduling and Execution of Threads
    • 具有并行调度和执行线程的图形处理器
    • US20080074433A1
    • 2008-03-27
    • US11533880
    • 2006-09-21
    • Guofang JiaoYun DuChun Yu
    • Guofang JiaoYun DuChun Yu
    • G06T1/00
    • G06T15/005
    • A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.
    • 描述了能够并行调度和执行多个线程的图形处理器以及用于实现并行调度和执行的技术。 图形处理器可以包括多个硬件单元和调度器。 硬件单元可并行操作,每个硬件单元支持相应的一组操作。 硬件单元可以包括ALU核,基本功能核心,逻辑核心,纹理采样器,负载控制单元,一些其他硬件单元或其组合。 调度器将多个线程的指令同时分配到硬件单元。 图形处理器还可以包括指令高速缓存以存储线程和寄存器组以存储数据的指令。 指令高速缓存和寄存器组可以由硬件单元共享。
    • 23. 发明申请
    • Multi-stage floating-point accumulator
    • 多级浮点累加器
    • US20080046495A1
    • 2008-02-21
    • US11506349
    • 2006-08-18
    • Yun DuChun YuGuofang Jiao
    • Yun DuChun YuGuofang Jiao
    • G06F7/38
    • G06F7/5095G06F5/012G06F7/485G06F7/49936G06F2207/3884
    • A multi-stage floating-point accumulator includes at least two stages and is capable of operating at higher speed. In one design, the floating-point accumulator includes first and second stages. The first stage includes three operand alignment units, two multiplexers, and three latches. The three operand alignment units operate on a current floating-point value, a prior floating-point value, and a prior accumulated value. A first multiplexer provides zero or the prior floating-point value to the second operand alignment unit. A second multiplexer provides zero or the prior accumulated value to the third operand alignment unit. The three latches couple to the three operand alignment units. The second stage includes a 3-operand adder to sum the operands generated by the three operand alignment units, a latch, and a post alignment unit.
    • 多级浮点累加器包括至少两级,并且能够以更高的速度运行。 在一种设计中,浮点累加器包括第一级和第二级。 第一级包括三个操作对准单元,两个多路复用器和三个锁存器。 三个操作数对齐单元以当前浮点值,先前浮点值和先前累加值操作。 第一多路复用器为第二操作数对准单元提供零或先前的浮点值。 第二多路复用器为第三操作数对准单元提供零或先前的累加值。 三个锁存器耦合到三个操作数对齐单元。 第二级包括一个3操作数加法器,用于对由三个操作数对齐单元产生的操作数,一个锁存器和一个后置对准单元求和。
    • 24. 发明申请
    • GRAPHICS PROCESSING UNIT WITH SHARED ARITHMETIC LOGIC UNIT
    • 具有共享算术逻辑单元的图形处理单元
    • US20080030512A1
    • 2008-02-07
    • US11550344
    • 2006-10-17
    • Guofang JiaoBrian RuttenbergChun YuYun Du
    • Guofang JiaoBrian RuttenbergChun YuYun Du
    • G06T1/20
    • G06T15/005
    • This disclosure describes a graphics processing unit (GPU) pipeline that uses one or more shared arithmetic logic units (ALUs). In order to facilitate such sharing of ALUs, the stages of the disclosed GPU pipeline may be rearranged relative to conventional GPU pipelines. In addition, by rearranging the stages of the GPU pipeline, efficiencies may be achieved in the image processing. Unlike conventional GPU pipelines, for example, an attribute gradient setup stage can be located much later in the pipeline, and the attribute interpolator stage may immediately follow the attribute gradient setup stage. This allows sharing of an ALU by the attribute gradient setup and attribute interpolator stages. Several other techniques and features for the GPU pipeline are also described, which may improve performance and possibly achieve additional processing efficiencies.
    • 本公开描述了使用一个或多个共享算术逻辑单元(ALU)的图形处理单元(GPU)流水线。 为了促进ALU的这种共享,所公开的GPU流水线的阶段可以相对于常规GPU流水线重新排列。 此外,通过重新排列GPU流水线的各个阶段,可以在图像处理中实现效率。 与传统GPU流水线不同,例如,属性梯度建立阶段可以在流水线后面定位,并且属性内插器阶段可以立即跟随属性梯度建立阶段。 这允许通过属性渐变设置和属性内插器阶段共享ALU。 还描述了用于GPU流水线的若干其它技术和特征,这可以提高性能并可能实现额外的处理效率。
    • 25. 发明申请
    • Unified virtual addressed register file
    • 统一的虚拟寻址寄存器文件
    • US20070296729A1
    • 2007-12-27
    • US11472701
    • 2006-06-21
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • G09G5/36
    • G06F9/3851G06F9/3012G06F9/30123G06F9/30138G06F9/384G06T15/005
    • A multi-threaded processor is provided, such as a shader processor, having an internal unified memory space that is shared by a plurality of threads and is dynamically assigned to threads as needed. A mapping table that maps virtual registers to available internal addresses in the unified memory space so that thread registers can be stored in contiguous or non-contiguous memory addresses. Dynamic sizing of the virtual registers allows flexible allocation of the unified memory space depending on the type and size of data in a thread register. Yet another feature provides an efficient method for storing graphics data in the unified memory space to improve fetch and store operations from the memory space. In particular, pixel data for four pixels in a thread are stored across four memory devices having independent input/output ports that permit the four pixels to be read in a single clock cycle for processing.
    • 提供了多线程处理器,例如着色器处理器,具有由多个线程共享的内部统一存储器空间,并且根据需要动态分配给线程。 映射表将虚拟寄存器映射到统一存储空间中的可用内部地址,以便线程寄存器可以存储在连续或不连续的存储器地址中。 虚拟寄存器的动态大小允许根据线程寄存器中数据的类型和大小灵活分配统一存储空间。 另一个特征提供了用于将统计存储器空间中的图形数据存储以改善从存储器空间获取和存储操作的有效方法。 特别地,线程中的四个像素的像素数据被存储在具有独立输入/输出端口的四个存储器件中,这些存储器件允许以单个时钟周期读取四个像素进行处理。
    • 26. 发明申请
    • Processor core stack extension
    • 处理器核心堆栈扩展
    • US20070282928A1
    • 2007-12-06
    • US11448272
    • 2006-06-06
    • Guofang JiaoYun DuChun Yu
    • Guofang JiaoYun DuChun Yu
    • G06F17/30
    • G06F12/0875G06F9/485
    • In general, the disclosure is directed to techniques for controlling stack overflow. The techniques described herein utilize a portion of a common cache or memory located outside of the processor core as a stack extension. A processor core monitors a stack within the processor core and transfers the content of the stack to the stack extension outside of the processor core when the processor core stack exceeds a maximum number of entries. When the processor core determines the stack within the processor core falls below a minimum number of entries the processor core transfers at least a portion of the content maintained in the stack extension into the stack within the processor core. The techniques prevent malfunction and crash of threads executing within the processor core by utilizing stack extensions outside of the processor core.
    • 通常,本公开涉及用于控制堆栈溢出的技术。 本文描述的技术利用位于处理器核心外部的公共高速缓存或存储器的一部分作为堆栈扩展。 当处理器核心堆栈超过最大数量的条目时,处理器核心监视处理器核心内的堆栈并将堆栈的内容传输到处理器核心外部的堆栈扩展。 当处理器核心确定处理器核心内的堆栈低于最小数量的条目时,处理器核心将保持在堆栈扩展中的内容的至少一部分传输到处理器核心内的堆栈中。 该技术通过利用处理器核心外部的堆栈扩展来防止在处理器核心内执行的线程的故障和崩溃。
    • 27. 发明申请
    • Graphics processor with arithmetic and elementary function units
    • 具有算术和基本功能单元的图形处理器
    • US20070273698A1
    • 2007-11-29
    • US11441696
    • 2006-05-25
    • Yun DuGuofang JiaoChun YuAlexei V. Bourd
    • Yun DuGuofang JiaoChun YuAlexei V. Bourd
    • G06T1/00
    • G06T1/20G06F9/30167G06F9/383G06F9/3851G06F9/3885
    • A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.
    • 描述能够有效执行算术运算和计算基本功能的图形处理器。 图形处理器具有至少一个可执行算术运算的算术逻辑单元(ALU)和至少一个可以计算基本功能的基本功能单元。 ALU和基本功能单元可以被布置成使得它们可以并行操作以提高吞吐量。 图形处理器还可以包括比ALU更少的基本功能单元,例如四个ALU和单个基本功能单元。 四个ALU可以对(1)四个像素的属性的四个分量或(2)四个像素的属性的一个分量执行算术运算。 单个基本功能单元可以一次操作一个像素的一个分量。 使用单个基本功能单元可以降低成本,同时仍然提供良好的性能。
    • 28. 发明授权
    • Multi-threaded processor with deferred thread output control
    • 具有延迟线程输出控制的多线程处理器
    • US08869147B2
    • 2014-10-21
    • US11445100
    • 2006-05-31
    • Yun DuGuofang JiaoChun Yu
    • Yun DuGuofang JiaoChun Yu
    • G06F9/46G06F9/48G06F9/30G06F9/38
    • G06F9/4881G06F9/30123G06F9/3836G06F9/3851G06F9/3855G06F9/3857Y02D10/24
    • A multi-threaded processor is provided that internally reorders output threads thereby avoiding the need for an external output reorder buffer. The multi-threaded processor writes its thread results back to an internal memory buffer to guarantee that thread results are outputted in the same order in which the threads are received. A thread scheduler within the multi-threaded processor manages thread ordering control to avoid the need for an external reorder buffer. A compiler for the multi-threaded processor converts instructions that would normally send processed results directly to an external reorder buffer so that the processed thread results are instead sent to the internal memory buffer of the multi-threaded processor.
    • 提供一种多线程处理器,其内部重新排序输出线程,从而避免需要外部输出重排序缓冲器。 多线程处理器将其线程结果写回内部存储器缓冲区,以保证以与接收线程相同的顺序输出线程结果。 多线程处理器内的线程调度器管理线程排序控制,以避免需要外部重排序缓冲区。 用于多线程处理器的编译器将通常将处理结果直接发送到外部重排序缓冲器的指令转换成经处理的线程结果而不是发送到多线程处理器的内部存储器缓冲区。
    • 29. 发明授权
    • Unified virtual addressed register file
    • 统一的虚拟寻址寄存器文件
    • US08766996B2
    • 2014-07-01
    • US11472701
    • 2006-06-21
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • G09G5/36
    • G06F9/3851G06F9/3012G06F9/30123G06F9/30138G06F9/384G06T15/005
    • A multi-threaded processor is provided, such as a shader processor, having an internal unified memory space that is shared by a plurality of threads and is dynamically assigned to threads as needed. A mapping table that maps virtual registers to available internal addresses in the unified memory space so that thread registers can be stored in contiguous or non-contiguous memory addresses. Dynamic sizing of the virtual registers allows flexible allocation of the unified memory space depending on the type and size of data in a thread register. Yet another feature provides an efficient method for storing graphics data in the unified memory space to improve fetch and store operations from the memory space. In particular, pixel data for four pixels in a thread are stored across four memory devices having independent input/output ports that permit the four pixels to be read in a single clock cycle for processing.
    • 提供了多线程处理器,例如着色器处理器,具有由多个线程共享的内部统一存储器空间,并且根据需要动态分配给线程。 映射表将虚拟寄存器映射到统一存储空间中的可用内部地址,以便线程寄存器可以存储在连续或不连续的存储器地址中。 虚拟寄存器的动态大小允许根据线程寄存器中数据的类型和大小灵活分配统一存储空间。 另一个特征提供了用于将统计存储器空间中的图形数据存储以改善从存储器空间获取和存储操作的有效方法。 特别地,线程中的四个像素的像素数据被存储在具有独立输入/输出端口的四个存储器件中,这些存储器件允许以单个时钟周期读取四个像素进行处理。
    • 30. 发明授权
    • Graphics processors with parallel scheduling and execution of threads
    • 具有并行调度和线程执行的图形处理器
    • US08345053B2
    • 2013-01-01
    • US11533880
    • 2006-09-21
    • Guofang JiaoYun DuChun Yu
    • Guofang JiaoYun DuChun Yu
    • G06F15/80G06F15/00G06T1/00
    • G06T15/005
    • A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.
    • 描述了能够并行调度和执行多个线程的图形处理器以及用于实现并行调度和执行的技术。 图形处理器可以包括多个硬件单元和调度器。 硬件单元可并行操作,每个硬件单元支持相应的一组操作。 硬件单元可以包括ALU核,基本功能核心,逻辑核心,纹理采样器,负载控制单元,一些其他硬件单元或其组合。 调度器将多个线程的指令同时分配到硬件单元。 图形处理器还可以包括指令高速缓存以存储线程和寄存器组以存储数据的指令。 指令高速缓存和寄存器组可以由硬件单元共享。