专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

31. 发明申请

US20110252204A1 SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
标题翻译：具有多个并行请求管理的共享单个访问记忆
公开(公告)号：US20110252204A1
公开(公告)日：2011-10-13
申请号：US13165638
申请日：2011-06-21
申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills
发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills
IPC分类号： G06F12/08
CPC分类号： G06F12/084 , Y02D10/13
摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
摘要翻译：多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

32. 发明申请

US20110078692A1 COALESCING MEMORY BARRIER OPERATIONS ACROSS MULTIPLE PARALLEL THREADS 有权
标题翻译：通过多个并行线程来解决存储器障碍操作
公开(公告)号：US20110078692A1
公开(公告)日：2011-03-31
申请号：US12887081
申请日：2010-09-21
申请人： John R. NICKOLLS , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
发明人： John R. NICKOLLS , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F9/46
CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851
摘要： One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
摘要翻译：本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。此外，存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。例如，第一类型的存储器障碍指令可以将存储器事务提交到共享L1（一级）高速缓存的一组协作线程的级别。第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。最后，第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

33. 发明授权

US07864185B1 Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求
公开(公告)号：US07864185B1
公开(公告)日：2011-01-04
申请号：US12256848
申请日：2008-10-23
申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon
发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon
IPC分类号： G06T11/40 , G06T15/00 , G06T15/20 , G06T1/00
CPC分类号： G06T11/60 , G09G5/363
摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
摘要翻译：图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

34. 发明授权

US07711990B1 Apparatus and method for debugging a graphics processing unit in response to a debug instruction 有权
标题翻译：响应于调试指令调试图形处理单元的装置和方法
公开(公告)号：US07711990B1
公开(公告)日：2010-05-04
申请号：US11302952
申请日：2005-12-13
申请人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller
发明人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller
IPC分类号： G06F11/00
CPC分类号： G06F11/3648
摘要： A system includes a graphics processing unit with a processor responsive to a debug instruction that initiates the storage of execution state information. A memory stores the execution state information. A central processing unit executes a debugging program to analyze the execution state information.
摘要翻译：系统包括具有处理器的图形处理单元，该处理器响应于启动执行状态信息的存储的调试指令。存储器存储执行状态信息。中央处理单元执行调试程序以分析执行状态信息。

35. 发明授权

US07634621B1 Register file allocation 有权
标题翻译：注册文件分配
公开(公告)号：US07634621B1
公开(公告)日：2009-12-15
申请号：US11556677
申请日：2006-11-03
申请人： Brett W. Coon , John Erik Lindholm , Gary Tarolli , Svetoslav D. Tzvetkov , John R. Nickolls , Ming Y. Siu
发明人： Brett W. Coon , John Erik Lindholm , Gary Tarolli , Svetoslav D. Tzvetkov , John R. Nickolls , Ming Y. Siu
IPC分类号： G06F12/00
CPC分类号： G06F9/3012 , G06F9/30123 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F12/0223 , Y02D10/13
摘要： Circuits, methods, and apparatus that provide the die area and power savings of a single-ported memory with the performance advantages of a multiported memory. One example provides register allocation methods for storing data in a multiple-bank register file. In a thin register allocation method, data for a process is stored in a single bank. In this way, different processes use different banks to avoid conflicts. In a fat register allocation method, processes store data in each bank. In this way, if one process uses a large number of registers, those registers are spread among the banks, avoiding a situation where one bank is filled and other processes are forced to share a reduced number of banks. In a hybrid register allocation method, processes store data in more than one bank, but fewer than all the banks. Each of these methods may be combined in varying ways.
摘要翻译：提供具有多端口存储器性能优势的单端口存储器的管芯面积和功率节省的电路，方法和装置。一个示例提供用于将数据存储在多存储器寄存器文件中的寄存器分配方法。在一个薄的寄存器分配方法中，一个进程的数据被存储在一个单独的存储单元中。以这种方式，不同的流程使用不同的银行来避免冲突。在胖寄存器分配方法中，处理将数据存储在每个存储区中。这样一来，如果一个进程使用大量的寄存器，这些寄存器就会在银行之间传播，避免了一个银行被填满的情况，而其他进程被迫分担一个数量减少的银行。在混合寄存器分配方法中，处理将数据存储在多个银行中，但少于所有银行。这些方法中的每一种可以以不同的方式组合。

36. 发明申请

US20090240860A1 Lock Mechanism to Enable Atomic Updates to Shared Memory 有权
标题翻译：锁定机制来启用共享内存的原子更新
公开(公告)号：US20090240860A1
公开(公告)日：2009-09-24
申请号：US12054267
申请日：2008-03-24
申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills
发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills
IPC分类号： G06F12/14
CPC分类号： G06F12/084 , G06F9/3004 , G06F9/30087 , G06F9/30185 , G06F9/526 , G06F2209/521
摘要： A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
摘要翻译：用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。读取数据将返回给具有锁定状态的请求者。在写入期间在读取或解锁期间锁定时，锁定状态可能会同时更改。因此，在读取 - 修改 - 写入操作之前或期间，不必将锁定状态检查为单独的事务。另外，可以为每个原子存储器操作明确地指定锁定或解锁。因此，对于不修改内存位置的内容的操作，不执行锁定操作。

37. 发明授权

US09189242B2 Credit-based streaming multiprocessor warp scheduling 有权
标题翻译：基于信用流的多处理器扭曲调度
公开(公告)号：US09189242B2
公开(公告)日：2015-11-17
申请号：US12885299
申请日：2010-09-17
申请人： John Erik Lindholm , Brett W. Coon , Jered Wierzbicki , Robert J. Stoll , Stuart F. Oberman
发明人： John Erik Lindholm , Brett W. Coon , Jered Wierzbicki , Robert J. Stoll , Stuart F. Oberman
IPC分类号： G06F9/50 , G06F9/38
CPC分类号： G06F9/3851 , G06F9/3836 , G06F9/3885 , G06F9/3887 , G06F9/3889
摘要： One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
摘要翻译：本发明的一个实施例提出了一种用于确保高速缓存访问指令被调度用于在多线程系统中执行以提高高速缓存位置和系统性能的技术。可以使用基于信用的技术来对组中的每个翘曲的指令调度来控制指令，使得一组经线被均匀地处理。对每个经纱计算信用额度，并且信用额度有助于每个经线的权重。权重用于选择要执行的经纱的说明。

38. 发明授权

US08572355B2 Support for non-local returns in parallel thread SIMD engine 有权
标题翻译：支持并行线程SIMD引擎中的非本地返回
公开(公告)号：US08572355B2
公开(公告)日：2013-10-29
申请号：US12881065
申请日：2010-09-13
申请人： Guillermo Juan Rozas , Brett W. Coon
发明人： Guillermo Juan Rozas , Brett W. Coon
IPC分类号： G06F9/30
CPC分类号： G06F9/30058 , G06F9/3851
摘要： One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.
摘要翻译：本发明的一个实施例提出了一种用于在并行线程处理器中执行非本地返回指令的方法。该方法包括以下步骤：在线程组内接收第一长跳转指令，作为响应，从执行堆栈中弹出第一个令牌。该方法还包括当与第一长跳转指令相关联的第一推送指令被执行时，确定第一令牌是否是被推送到执行堆栈上的第一长跳转令牌，以及当第一令牌是第一长跳转令牌时，跳转基于由第一长跳转令牌指定的地址到第二指令，或者当第一令牌不是第一长跳转令牌时，禁用活动线程，直到从执行堆栈弹出第一个长跳转令牌。

39. 发明授权

US08405665B2 Programmable graphics processor for multithreaded execution of programs 有权
标题翻译：用于多线程执行程序的可编程图形处理器
公开(公告)号：US08405665B2
公开(公告)日：2013-03-26
申请号：US13466043
申请日：2012-05-07
申请人： John Erik Lindholm , Brett W. Coon , Stuart F. Oberman , Ming Y. Siu , Matthew P. Gerlach
发明人： John Erik Lindholm , Brett W. Coon , Stuart F. Oberman , Ming Y. Siu , Matthew P. Gerlach
IPC分类号： G06F15/16 , G06F15/80 , G06F13/14 , G06T1/20
CPC分类号： G06T15/005
摘要： A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
摘要翻译：处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

40. 发明授权

US08225076B1 Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor 有权
标题翻译：记分牌具有用于跟踪多线程处理器中的顺序目的地寄存器使用的大小指示符
公开(公告)号：US08225076B1
公开(公告)日：2012-07-17
申请号：US12233515
申请日：2008-09-18
申请人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu
发明人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu
IPC分类号： G06F9/30
CPC分类号： G06F9/3851 , G06F9/3838 , G06F9/3879 , G06F9/3885
摘要： A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
摘要翻译：用于处理单元的记分板存储器具有分配给要处理的多个线程中的每一个的分离的存储器区域。对于每个线程，记分板存储器存储具有待处理写入的寄存器的寄存器标识符。当指令被添加到指令缓冲器中时，将指令中指定的寄存器的寄存器标识符与存储在该指令的线程的记分板存储器中的寄存器标识进行比较，并生成表示比较结果的多位值。多位值与指令一起存储在指令缓冲器中，并且可以更新为属于同一线程的指令完成其执行。在执行指令之前，将检查该多位值。如果该多位值表示指令中没有指定的寄存器没有挂起写操作，则允许指令执行。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式