会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 65. 发明授权
    • Counter-based delay of dependent thread group execution
    • 依赖线程组执行的基于计数器的延迟
    • US07526634B1
    • 2009-04-28
    • US11535871
    • 2006-09-27
    • Jerome F. Duluk, Jr.Stephen D. LewJohn R. Nickolls
    • Jerome F. Duluk, Jr.Stephen D. LewJohn R. Nickolls
    • G06F9/40
    • G06F9/52G06F9/546G06F2209/548
    • Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or “sets” of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon completion of execution of the first set. A parallel or graphics processor (GPU) can autonomously execute the first set of CTAs and delay execution of the second set of CTAs until the first set of CTAs is complete. In some embodiments the GPU may determine that a third set of CTAs is not dependent upon the first set, and may launch the third set of CTAs while the second set of CTAs is delayed. In this manner, the GPU may execute launch commands out of order with respect to the order of the launch commands in the pushbuffer.
    • 由线程执行的处理工作同步的系统和方法,协同线程数组(CIA)或CTA的“集合”。 中央处理单元可以加载针对第一组CTA和第二组CTA的推送命令,并且在第一组的执行完成时指定第二组的依赖关系。 并行或图形处理器(GPU)可以自主地执行第一组CTA并且延迟第二组CTA的执行,直到第一组CTA完成。 在一些实施例中,GPU可以确定第三组CTA不依赖于第一组,并且可以启动第三组CTA,同时第二组CTA被延迟。 以这种方式,GPU可以相对于推送缓冲器中的发射命令的顺序执行命令无序。
    • 67. 发明授权
    • Address mapping for a parallel thread processor
    • 并行线程处理器的地址映射
    • US08700877B2
    • 2014-04-15
    • US12890518
    • 2010-09-24
    • Michael C. ShebanowYan Yan TangJohn R. Nickolls
    • Michael C. ShebanowYan Yan TangJohn R. Nickolls
    • G06F12/00G06F13/00G06F13/28
    • G06F12/0284G06F9/3851G06F12/0607
    • A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.
    • 一种并行线程处理器中线程地址映射的方法。 该方法包括接收与线程组中的第一线程相关联的线程地址; 基于线程地址在线程地址空间的本地窗口内的位置来计算有效地址; 基于有效地址和与第一线程相关联的线程标识符计算与线程组相关联的地址空间中的线程组地址; 以及基于所述线程组地址和线程组标识符计算与所述第一线程相关联的虚拟地址,其中所述虚拟地址用于访问与所述线程地址相关联的存储器中的位置以加载或存储数据。
    • 70. 发明授权
    • Systems and methods for coalescing memory accesses of parallel threads
    • 并行线程内存访问的系统和方法
    • US08086806B2
    • 2011-12-27
    • US12054330
    • 2008-03-24
    • Lars NylandJohn R. NickollsGentaro HirotaTanmoy Mandal
    • Lars NylandJohn R. NickollsGentaro HirotaTanmoy Mandal
    • G06F12/00
    • G06F9/3824G06F9/3851G06F9/3885G06F9/3891
    • One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data. When all the memory access requests associated with a particular PRT entry are complete, the core interface satisfies the corresponding application request and frees the PRT entry.
    • 本发明的一个实施例提出了一种用于有效且灵活地执行线程组合的存储器访问的技术。 对于为线程组服务的每个读取应用程序请求,核心接口生成一个未决请求表(PRT)条目和一个或多个内存访问请求。 核心接口基于应用程序请求中的存储器访问地址的扩展来确定存储器访问请求的数量和每个存储器访问请求的大小。 每个存储器访问请求指定存储器访问请求服务的特定线程。 PRT条目跟踪挂起的内存访问请求的数量。 当存储器接口完成每个存储器访问请求时,核心接口使用存储器访问请求中的信息和对应的PRT条目来路由返回的数据。 当与特定PRT条目相关联的所有存储器访问请求完成时,核心接口满足相应的应用请求并释放PRT条目。