专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

51. 发明授权

US08074224B1 Managing state information for a multi-threaded processor 有权
标题翻译：管理多线程处理器的状态信息
公开(公告)号：US08074224B1
公开(公告)日：2011-12-06
申请号：US11311963
申请日：2005-12-19
申请人： Bryon S. Nordquist , Brett W. Coon
发明人： Bryon S. Nordquist , Brett W. Coon
IPC分类号： G06F9/46 , G06F9/40 , G06F15/76
CPC分类号： G06F9/52 , G06F9/3012 , G06F9/30123 , G06F9/3851 , G06F9/3853 , G06F9/3887 , G06T1/20
摘要： Embodiments of the present invention facilitate dynamically adapting to state information changes in a graphics processing environment. In one embodiment, a master register holds state information corresponding to units of work (threads) to be performed. The state information in the master register is copied to a per-group state register when a group of threads is to be launched. The per-group state register is coupled to processing engines configured to process the threads, so that the processing engines read state information from the per-group state register rather than the master register. In another embodiment, a number of master registers may be used to store state information for different types of threads.
摘要翻译：本发明的实施例有助于动态地适应图形处理环境中的状态信息变化。在一个实施例中，主寄存器保存对应于要执行的工作单元（线程）的状态信息。当一组线程要启动时，主寄存器中的状态信息被复制到每组状态寄存器。每组状态寄存器被耦合到配置成处理线程的处理引擎，使得处理引擎从每组状态寄存器而不是主寄存器读取状态信息。在另一个实施例中，可以使用多个主寄存器来存储不同类型的线程的状态信息。

52. 发明申请

US20110173414A1 MAXIMIZED MEMORY THROUGHPUT ON PARALLEL PROCESSING DEVICES 有权
标题翻译：最大化的并行处理器件的存储器
公开(公告)号：US20110173414A1
公开(公告)日：2011-07-14
申请号：US13069384
申请日：2011-03-23
申请人： Norbert Juffa , Brett W. Coon
发明人： Norbert Juffa , Brett W. Coon
IPC分类号： G06F9/38
CPC分类号： G06F9/3887 , G06F9/3455 , G06F9/3851 , G06F9/3889
摘要： In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.
摘要翻译：在用于流计算的并行处理装置中，流的每个数据元素的处理可能不是计算密集的，因此与读取流并写入结果所需的存储器访问时间相比，处理可能需要相对较少的时间来计算。因此，内存吞吐量通常会限制流计算的性能。一般来说，提供了用于在这种存储器吞吐量限制的流计算中实现改进的，优化的或最终最大化的存储器吞吐量的方法。通过提高跨多个处理元件和线程的聚合内存吞吐量，最大化流计算性能。通过平衡线程和线程组之间的处理负载以及耦合到并行处理设备的硬件存储器接口来实现高聚合内存吞吐量。

53. 发明申请

US20110078417A1 COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
标题翻译：合作螺线减排和扫描作业
公开(公告)号：US20110078417A1
公开(公告)日：2011-03-31
申请号：US12890227
申请日：2010-09-24
申请人： Brian FAHS , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland
发明人： Brian FAHS , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland
IPC分类号： G06F9/38
CPC分类号： G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851
摘要： One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
摘要翻译：本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

54. 发明申请

US20110078381A1 Cache Operations and Policies For A Multi-Threaded Client 有权
标题翻译：多线程客户端的缓存操作和策略
公开(公告)号：US20110078381A1
公开(公告)日：2011-03-31
申请号：US12890476
申请日：2010-09-24
申请人： Steven James HEINRICH , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs
发明人： Steven James HEINRICH , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs
IPC分类号： G06F12/08 , G06F12/00
CPC分类号： G06F12/0842 , G06F12/0897
摘要： A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.
摘要翻译：一种用于在处理单元中管理并行高速缓存层级的方法。该方法包括接收包括高速缓存操作修饰符的指令，该缓存操作修饰符标识其中要缓存与指令相关联的数据的并行高速缓存层级的级别; 并基于高速缓存操作修饰符实现高速缓存替换策略。

55. 发明申请

US20110072213A1 INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY 有权
标题翻译：用于管理并行缓存高速缓存的指令
公开(公告)号：US20110072213A1
公开(公告)日：2011-03-24
申请号：US12888409
申请日：2010-09-22
申请人： John R. NICKOLLS , Brett W. Coon , Michael C. Shebanow
发明人： John R. NICKOLLS , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F12/08 , G06F12/00
CPC分类号： G06F9/3887 , G06F9/30043 , G06F9/3009 , G06F9/3836 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/121 , G06F2212/452
摘要： A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
摘要翻译：一种用于在处理单元中管理并行高速缓存层级的方法。该方法包括从调度器单元接收指令，其中指令包括加载指令或存储指令; 确定所述指令包括高速缓存操作修饰符，所述缓存操作修饰符标识用于缓存与所述并行高速缓存层级的一个或多个级别上的所述指令相关联的数据的并且基于高速缓存操作修饰符执行指令并缓存与指令相关联的数据。

56. 发明授权

US07680988B1 Single interconnect providing read and write access to a memory shared by concurrent threads 有权
标题翻译：单一互连提供对并发线程共享的内存的读写访问
公开(公告)号：US07680988B1
公开(公告)日：2010-03-16
申请号：US11554563
申请日：2006-10-30
申请人： John R. Nickolls , Brett W. Coon , Ming Y. Siu , Stuart F. Oberman , Samuel Liu
发明人： John R. Nickolls , Brett W. Coon , Ming Y. Siu , Stuart F. Oberman , Samuel Liu
IPC分类号： G06F13/16
CPC分类号： G06F12/084 , G06F9/544 , G06F15/167 , Y02D10/13
摘要： A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.
摘要翻译：共享存储器可由多线程处理器中的并发线程使用，共享存储器中的任何可寻址存储位置可由任何线程读取和写入。执行线程的处理引擎通过仅在一个方向（例如，从共享存储器到处理引擎）传送数据的互连来耦合到共享存储器; 相同的互连支持读写操作。互连有利地支持多个并行读或写操作。

57. 发明授权

US09952977B2 Cache operations and policies for a multi-threaded client 有权
公开(公告)号：US09952977B2
公开(公告)日：2018-04-24
申请号：US12890476
申请日：2010-09-24
申请人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs
发明人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs
IPC分类号： G06F12/00 , G06F12/0842 , G06F12/0897
CPC分类号： G06F12/0842 , G06F12/0897
摘要： A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

58. 发明授权

US08539204B2 Cooperative thread array reduction and scan operations 有权
标题翻译：合作线程数组减少和扫描操作
公开(公告)号：US08539204B2
公开(公告)日：2013-09-17
申请号：US12890227
申请日：2010-09-24
申请人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland
发明人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland
IPC分类号： G06F9/30 , G06F9/40 , G06F15/00
CPC分类号： G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851
摘要： One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
摘要翻译：本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

59. 发明授权

US08522000B2 Trap handler architecture for a parallel processing unit 有权
标题翻译：并行处理单元的陷阱处理器架构
公开(公告)号：US08522000B2
公开(公告)日：2013-08-27
申请号：US12569831
申请日：2009-09-29
申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
IPC分类号： G06F9/00
CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861
摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.
摘要翻译：陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

60. 发明授权

US08375176B2 Lock mechanism to enable atomic updates to shared memory 有权
标题翻译：锁定机制，以实现对共享内存的原子更新
公开(公告)号：US08375176B2
公开(公告)日：2013-02-12
申请号：US13276224
申请日：2011-10-18
申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills
发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills
IPC分类号： G06F12/00
CPC分类号： G06F12/084 , G06F9/3004 , G06F9/30087 , G06F9/30185 , G06F9/526 , G06F2209/521
摘要： A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
摘要翻译：用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。读取数据将返回给具有锁定状态的请求者。在写入期间在读取或解锁期间锁定时，锁定状态可能会同时更改。因此，在读取 - 修改 - 写入操作之前或期间，不必将锁定状态检查为单独的事务。另外，可以为每个原子存储器操作明确地指定锁定或解锁。因此，对于不修改内存位置的内容的操作，不执行锁定操作。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式