会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 41. 发明授权
    • Dependent instruction thread scheduling
    • 依赖指令线程调度
    • US08291431B2
    • 2012-10-16
    • US11468221
    • 2006-08-29
    • Yun DuGuofang JiaoChun Yu
    • Yun DuGuofang JiaoChun Yu
    • G06F9/46G06F9/40
    • G06F9/3851G06F9/3824G06F9/3838
    • A thread scheduler includes context units for managing the execution of threads where each context unit includes a load reference counter for maintaining a counter value indicative of a difference between a number of data requests and a number of data returns associated with the particular context unit. A context controller of the thread context unit is configured to refrain from forwarding an instruction of a thread when the counter value is nonzero and the instruction includes a data dependency indicator indicating the instruction requires data returned by a previous instruction.
    • 线程调度器包括用于管理线程执行的上下文单元,其中每个上下文单元包括负载参考计数器,用于维持指示多个数据请求与与特定上下文单元相关联的数据返回数量之间的差异的计数器值。 线程上下文单元的上下文控制器被配置为当计数器值非零时避免转发线程的指令,并且该指令包括指示该指令需要先前指令返回的数据的数据依赖指示符。
    • 42. 发明申请
    • 3-D CLIPPING IN A GRAPHICS PROCESSING UNIT
    • 图形处理单元中的3-D剪辑
    • US20120256921A1
    • 2012-10-11
    • US13524946
    • 2012-06-15
    • Guofang JiaoChun YuLingjun ChenYun Du
    • Guofang JiaoChun YuLingjun ChenYun Du
    • G06T17/00
    • G06T1/20G06T11/40G06T11/60G06T15/005G06T15/30G06T19/00G09G5/393
    • A graphics processing unit (GPU) efficiently performs 3-dimensional (3-D) clipping using processing units used for other graphics functions. The GPU includes first and second hardware units and at least one buffer. The first hardware unit performs 3-D clipping of primitives using a first processing unit used for a first graphics function, e.g., an ALU used for triangle setup, depth gradient setup, etc. The first hardware unit may perform 3-D clipping by (a) computing clip codes for each vertex of each primitive, (b) determining whether to pass, discard or clip each primitive based on the clip codes for all vertices of the primitive, and (c) clipping each primitive to be clipped against clipping planes. The second hardware unit computes attribute component values for new vertices resulting from the 3-D clipping, e.g., using an ALU used for attribute gradient setup, attribute interpolation, etc. The buffer(s) store intermediate results of the 3-D clipping.
    • 图形处理单元(GPU)使用用于其他图形功能的处理单元有效地执行三维(3-D)剪辑。 GPU包括第一和第二硬件单元和至少一个缓冲器。 第一硬件单元使用用于第一图形功能的第一处理单元(例如用于三角形设置的ALU,深度梯度设置等)来对原语执行3-D限幅。第一硬件单元可以通过( a)计算每个图元的每个顶点的剪辑代码,(b)基于所述基元的所有顶点的剪辑代码来确定是否传递,丢弃或剪切每个图元,以及(c)剪切要针对剪切平面剪切的每个图元 。 第二硬件单元计算由3-D限幅产生的新顶点的属性分量值,例如使用用于属性梯度设置,属性插值等的ALU。该缓冲器存储3-D限幅的中间结果。
    • 43. 发明授权
    • Tiled cache for multiple software programs
    • 多个软件程序的平铺缓存
    • US08035650B2
    • 2011-10-11
    • US11493444
    • 2006-07-25
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • Yun DuGuofang JiaoChun YuDe Dzwo Hsu
    • G09G5/36G06F15/167G06F13/00G06F13/28
    • G06F12/0864G06F9/3802G06F9/3851G06F12/0842
    • Caching techniques for storing instructions, constant values, and other types of data for multiple software programs are described. A cache provides storage for multiple programs and is partitioned into multiple tiles. Each tile is assignable to one program. Each program may be assigned any number of tiles based on the program's cache usage, the available tiles, and/or other factors. A cache controller identifies the tiles assigned to the programs and generates cache addresses for accessing the cache. The cache may be partitioned into physical tiles. The cache controller may assign logical tiles to the programs and may map the logical tiles to the physical tiles within the cache. The use of logical and physical tiles may simplify assignment and management of the tiles.
    • 描述用于存储用于多个软件程序的指令,常数值和其他类型的数据的缓存技术。 高速缓存为多个程序提供存储,并分区成多个瓦片。 每个瓦片可分配给一个程序。 可以基于程序的高速缓存使用,可用的瓦片和/或其它因素来为每个程序分配任意数量的瓦片。 缓存控制器识别分配给程序的块,并生成用于访问高速缓存的高速缓存地址。 缓存可以被划分成物理块。 高速缓存控制器可以向程序分配逻辑块,并且可以将逻辑块映射到高速缓存内的物理块。 逻辑和物理瓦片的使用可以简化瓦片的分配和管理。
    • 44. 发明授权
    • Programmable blending in a graphics processing unit
    • 可编程混合在一个图形处理单元
    • US07973797B2
    • 2011-07-05
    • US11550958
    • 2006-10-19
    • Guofang JiaoChun YuLingjun ChenYun Du
    • Guofang JiaoChun YuLingjun ChenYun Du
    • G09G5/00G09G5/02G06T15/50G06T15/60
    • G06T15/503G06T2210/32
    • Techniques for implementing blending equations for various blending modes with a base set of operations are described. Each blending equation may be decomposed into a sequence of operations. In one design, a device includes a processing unit that implements a set of operations for multiple blending modes and a storage unit that stores operands and results. The processing unit receives a sequence of instructions for a sequence of operations for a blending mode selected from the plurality of blending modes and executes each instruction in the sequence to perform blending in accordance with the selected blending mode. The processing unit may include (a) an ALU that performs at least one operation in the base set, e.g., a dot product, (b) a pre-formatting unit that performs gamma correction and alpha scaling of inbound color values, and (c) a post-formatting unit that performs gamma compression and alpha scaling of outbound color values.
    • 描述了用于具有基本操作集合的用于各种混合模式的混合方程的技术。 每个混合方程可以分解为一系列操作。 在一种设计中,设备包括一个处理单元,该处理单元实现多种混合模式的一组操作,以及存储操作数和结果的存储单元。 处理单元接收用于从多个混合模式中选择的混合模式的操作序列的指令序列,并且执行该顺序中的每个指令以根据所选择的混合模式执行混合。 处理单元可以包括(a)执行基本集合中的至少一个操作的ALU,例如点积,(b)执行伽马校正和入站颜色值的α缩放的预格式化单元,以及(c )一个后格式化单元,用于执行出色色彩值的伽玛压缩和alpha缩放。
    • 45. 发明授权
    • Graphics processing unit with unified vertex cache and shader register file
    • 具有统一顶点缓存和着色器注册文件的图形处理单元
    • US07928990B2
    • 2011-04-19
    • US11535809
    • 2006-09-27
    • Guofang JiaoChun YuYun Du
    • Guofang JiaoChun YuYun Du
    • G09G5/36
    • G06T15/005
    • Techniques are described for processing computerized images with a graphics processing unit (GPU) using a unified vertex cache and shader register file. The techniques include creating a shared shader coupled to the GPU pipeline and a unified vertex cache and shader register file coupled to the shared shader to substantially eliminate data movement within the GPU pipeline. The GPU pipeline sends image geometry information based on an image geometry for an image to the shared shader. The shared shader performs vertex shading to generate vertex coordinates and attributes of vertices in the image. The shared shader then stores the vertex attributes in the unified vertex cache and shader register file, and sends only the vertex coordinates of the vertices back to the GPU pipeline. The GPU pipeline processes the image based on the vertex coordinates, and the shared shader processes the image based on the vertex attributes.
    • 描述了使用统一的顶点高速缓存和着色器寄存器文件处理具有图形处理单元(GPU)的计算机化图像的技术。 这些技术包括创建耦合到GPU流水线的共享着色器和耦合到共享着色器的统一顶点高速缓存和着色器寄存器文件,以基本上消除GPU流水线内的数据移动。 GPU管道将基于图像的图像几何的图像几何信息发送到共享着色器。 共享着色器执行顶点着色以生成图像中顶点坐标和顶点属性。 共享着色器然后将顶点属性存储在统一的顶点缓存和着色器寄存器文件中,并且仅将顶点的顶点坐标发送回GPU管道。 GPU流水线基于顶点坐标处理图像,共享着色器基于顶点属性处理图像。
    • 46. 发明授权
    • Relative address generation
    • 相对地址生成
    • US07805589B2
    • 2010-09-28
    • US11469347
    • 2006-08-31
    • Yun DuChun YuGuofang Jiao
    • Yun DuChun YuGuofang Jiao
    • G06F12/00
    • G06F12/06G06F9/345G06F9/355G06F9/3802G06F9/3875
    • Techniques to efficiently handle relative addressing are described. In one design, a processor includes an address generator and a storage unit. The address generator receives a relative address comprised of a base address and an offset, obtains a base value for the base address, sums the base value with the offset, and provides an absolute address corresponding to the relative address. The storage unit receives the base address and provides the base value to the address generator. The storage unit also receives the absolute address and provides data at this address. The address generator may derive the absolute address in a first clock cycle of a memory access. The storage unit may provide the data in a second clock cycle of the memory access. The storage unit may have multiple (e.g., two) read ports to support concurrent address generation and data retrieval.
    • 描述了有效处理相对寻址的技术。 在一种设计中,处理器包括地址发生器和存储单元。 地址生成器接收由基地址和偏移组成的相对地址,获得基地址的基值,将基本值与偏移量相加,并提供与相对地址对应的绝对地址。 存储单元接收基地址并将其提供给地址生成器。 存储单元还接收绝对地址,并在该地址处提供数据。 地址生成器可以在存储器访问的第一时钟周期中导出绝对地址。 存储单元可以在存储器访问的第二时钟周期中提供数据。 存储单元可以具有多个(例如两个)读端口,以支持并发地址生成和数据检索。
    • 47. 发明申请
    • PROGRAMMABLE GRAPHICS PROCESSING ELEMENT
    • 可编程图形处理元件
    • US20080252652A1
    • 2008-10-16
    • US11735353
    • 2007-04-13
    • Guofang JiaoLingjun ChenChun YuYun Du
    • Guofang JiaoLingjun ChenChun YuYun Du
    • G09G5/00
    • G06T15/005G06T15/40G06T15/503
    • In general, this disclosure describes techniques for performing graphics operations using programmable processing units in a graphics processing unit (GPU). As described herein, a GPU includes a graphics pipeline that includes a programmable graphics processing element (PGPE). In accordance with the techniques described herein, an arbitrary set of instructions is loaded into the PGPE. Subsequently, the PGPE may execute the set of instructions in order to generate a new pixel object. A pixel object describes a displayable pixel. The new pixel object may represent a result of performing a graphics operation on a first pixel object. A display device may display a pixel described by the new pixel object.
    • 通常,本公开描述了使用图形处理单元(GPU)中的可编程处理单元执行图形操作的技术。 如本文所述,GPU包括包括可编程图形处理元件(PGPE)的图形流水线。 根据本文描述的技术,任意一组指令被加载到PGPE中。 随后,PGPE可以执行该组指令以便生成新的像素对象。 像素对象描述可显示像素。 新的像素对象可以表示对第一像素对象执行图形操作的结果。 显示装置可以显示由新像素对象描述的像素。
    • 48. 发明申请
    • PROCESSOR WITH ADAPTIVE MULTI-SHADER
    • 具有自适应多镜像的处理器
    • US20080235316A1
    • 2008-09-25
    • US11690358
    • 2007-03-23
    • Yun DuGuofang JiaoChun Yu
    • Yun DuGuofang JiaoChun Yu
    • G06F7/38
    • G06T15/005
    • The disclosure describes an adaptive multi-shader within a processor that uses one or more high-precision arithmetic logic units (ALUs) and low-precision ALUs to process data based on the type of the data. Upon receiving a stream of data, the adaptive multi-shader first determines the type of the data. For example, the adaptive multi-shader may determine whether the data is suitable for high-precision processing or low-precision processing. The adaptive multi-shader then processes the data using the high-precision ALUs when the data is suitable for high-precision processing, and processes the data using the high-precision ALUs and the low-precision ALUs when the data is suitable for low-precision processing. The adaptive multi-shader may substantially reduce power consumption and silicon size of the processor by implementing the low-precision ALUs while maintaining the ability to process data using high-precision processing by implementing the high-precision ALUs.
    • 本公开描述了处理器内的自适应多着色器,其使用一个或多个高精度算术逻辑单元(ALU)和低精度ALU来基于数据的类型来处理数据。 在接收到数据流之后,自适应多着色器首先确定数据的类型。 例如,自适应多着色器可以确定数据是否适合于高精度处理或低精度处理。 然后,当数据适用于高精度处理时,自适应多着色器使用高精度ALU处理数据,并且当数据适合低精度处理时,使用高精度ALU和低精度ALU处理数据, 精密加工。 自适应多着色器可以通过实施低精度ALU同时保持使用通过实施高精度ALU的高精度处理数据的能力来显着降低处理器的功耗和硅尺寸。
    • 49. 发明申请
    • Multi-threaded processor with deferred thread output control
    • 具有延迟线程输出控制的多线程处理器
    • US20070283356A1
    • 2007-12-06
    • US11445100
    • 2006-05-31
    • Yun DuGuofang JiaoChun Yu
    • Yun DuGuofang JiaoChun Yu
    • G06F9/46
    • G06F9/4881G06F9/30123G06F9/3836G06F9/3851G06F9/3855G06F9/3857Y02D10/24
    • A multi-threaded processor is provided that internally reorders output threads thereby avoiding the need for an external output reorder buffer. The multi-threaded processor writes its thread results back to an internal memory buffer to guarantee that thread results are outputted in the same order in which the threads are received. A thread scheduler within the multi-threaded processor manages thread ordering control to avoid the need for an external reorder buffer. A compiler for the multi-threaded processor converts instructions that would normally send processed results directly to an external reorder buffer so that the processed thread results are instead sent to the internal memory buffer of the multi-threaded processor.
    • 提供了多线程处理器,其内部重新排序输出线程,从而避免了对外部输出重排序缓冲器的需要。 多线程处理器将其线程结果写回内部存储器缓冲区,以保证以与接收线程相同的顺序输出线程结果。 多线程处理器内的线程调度器管理线程排序控制,以避免需要外部重排序缓冲区。 用于多线程处理器的编译器将通常将处理结果直接发送到外部重排序缓冲器的指令转换成经处理的线程结果而不是发送到多线程处理器的内部存储器缓冲区。
    • 50. 发明授权
    • Multi-threaded processor with deferred thread output control
    • 具有延迟线程输出控制的多线程处理器
    • US08869147B2
    • 2014-10-21
    • US11445100
    • 2006-05-31
    • Yun DuGuofang JiaoChun Yu
    • Yun DuGuofang JiaoChun Yu
    • G06F9/46G06F9/48G06F9/30G06F9/38
    • G06F9/4881G06F9/30123G06F9/3836G06F9/3851G06F9/3855G06F9/3857Y02D10/24
    • A multi-threaded processor is provided that internally reorders output threads thereby avoiding the need for an external output reorder buffer. The multi-threaded processor writes its thread results back to an internal memory buffer to guarantee that thread results are outputted in the same order in which the threads are received. A thread scheduler within the multi-threaded processor manages thread ordering control to avoid the need for an external reorder buffer. A compiler for the multi-threaded processor converts instructions that would normally send processed results directly to an external reorder buffer so that the processed thread results are instead sent to the internal memory buffer of the multi-threaded processor.
    • 提供一种多线程处理器,其内部重新排序输出线程,从而避免需要外部输出重排序缓冲器。 多线程处理器将其线程结果写回内部存储器缓冲区,以保证以与接收线程相同的顺序输出线程结果。 多线程处理器内的线程调度器管理线程排序控制,以避免需要外部重排序缓冲区。 用于多线程处理器的编译器将通常将处理结果直接发送到外部重排序缓冲器的指令转换成经处理的线程结果而不是发送到多线程处理器的内部存储器缓冲区。