专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

61. 发明授权

US07761697B1 Processing an indirect branch instruction in a SIMD architecture 有权
标题翻译：在SIMD架构中处理间接分支指令
公开(公告)号：US07761697B1
公开(公告)日：2010-07-20
申请号：US11557082
申请日：2006-11-06
申请人： Brett W. Coon , John Erik Lindholm , Peter C. Mills , John R. Nickolls
发明人： Brett W. Coon , John Erik Lindholm , Peter C. Mills , John R. Nickolls
IPC分类号： G06F7/38 , G06F9/00 , G06F9/44
CPC分类号： G06F9/30072 , G06F9/3009 , G06F9/30185 , G06F9/322 , G06F9/3851 , G06F9/3887
摘要： One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.
摘要翻译：被配置为管理线程组中的发散线程的计算系统的一个实施例包括配置成存储至少一个令牌和多线程处理单元的堆栈。多线程处理单元被配置为执行以下步骤：获取程序指令，确定程序指令是间接分支指令，以及将间接分支指令处理为双向分支序列，以执行具有多个分支的间接分支指令地址可以使用间接分支指令来允许更大的灵活性，因为在编译时不需要确定分支地址或多个分支地址。

62. 发明授权

US07680988B1 Single interconnect providing read and write access to a memory shared by concurrent threads 有权
标题翻译：单一互连提供对并发线程共享的内存的读写访问
公开(公告)号：US07680988B1
公开(公告)日：2010-03-16
申请号：US11554563
申请日：2006-10-30
申请人： John R. Nickolls , Brett W. Coon , Ming Y. Siu , Stuart F. Oberman , Samuel Liu
发明人： John R. Nickolls , Brett W. Coon , Ming Y. Siu , Stuart F. Oberman , Samuel Liu
IPC分类号： G06F13/16
CPC分类号： G06F12/084 , G06F9/544 , G06F15/167 , Y02D10/13
摘要： A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.
摘要翻译：共享存储器可由多线程处理器中的并发线程使用，共享存储器中的任何可寻址存储位置可由任何线程读取和写入。执行线程的处理引擎通过仅在一个方向（例如，从共享存储器到处理引擎）传送数据的互连来耦合到共享存储器; 相同的互连支持读写操作。互连有利地支持多个并行读或写操作。

63. 发明授权

US07627723B1 Atomic memory operators in a parallel processor 有权
标题翻译：并行处理器中的原子存储器操作符
公开(公告)号：US07627723B1
公开(公告)日：2009-12-01
申请号：US11533896
申请日：2006-09-21
申请人： Ian A. Buck , John R. Nickolls , Michael C. Shebanow , Lars S. Nyland
发明人： Ian A. Buck , John R. Nickolls , Michael C. Shebanow , Lars S. Nyland
IPC分类号： G06F13/00 , G06F13/28
CPC分类号： G06F13/4022 , G06F9/3001 , G06F9/30018 , G06F9/30021 , G06F9/3004 , G06F9/30087 , G06F9/3824 , G06F9/3834 , G06F9/3851 , G06F9/3887 , G06F9/526 , G06F2209/521 , G06T1/20 , G09G5/363 , G09G5/393
摘要： Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.
摘要翻译：呈现用于在执行多个指令线程的同时更新存储器中的数据的方法，装置和系统，包括从多个并发执行的指令线程中的一个接收单个指令，响应于接收的单个指令，从特定的指令读取数据存储器位置，执行涉及从存储器位置读取的数据以产生结果的操作，以及将结果存储到特定存储器位置，而不需要单独的加载和存储指令，并且响应于接收的单个指令，排除另一个在从特定存储器位置读取数据的同时改变在特定存储器位置处的数据的多条指令线程，执行涉及数据的操作，以及将结果存储到特定存储器位置。

64. 发明申请

US20090240931A1 Indirect Function Call Instructions in a Synchronous Parallel Thread Processor 有权
标题翻译：同步并行线程处理器中的间接函数调用指令
公开(公告)号：US20090240931A1
公开(公告)日：2009-09-24
申请号：US12054255
申请日：2008-03-24
申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm
发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm
IPC分类号： G06F9/38
CPC分类号： G06F9/38 , G06F9/30054 , G06F9/30101 , G06F9/3851 , G06F9/3885
摘要： An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
摘要翻译：间接分支指令将地址寄存器作为参数，以便为单指令多线程（SIMT）处理器架构提供间接函数调用能力。间接分支指令用于实现间接函数调用，虚函数调用和switch语句，以提高处理性能，与使用连续的测试和分支链相比。

65. 发明授权

US07526634B1 Counter-based delay of dependent thread group execution 有权
标题翻译：依赖线程组执行的基于计数器的延迟
公开(公告)号：US07526634B1
公开(公告)日：2009-04-28
申请号：US11535871
申请日：2006-09-27
申请人： Jerome F. Duluk, Jr. , Stephen D. Lew , John R. Nickolls
发明人： Jerome F. Duluk, Jr. , Stephen D. Lew , John R. Nickolls
IPC分类号： G06F9/40
CPC分类号： G06F9/52 , G06F9/546 , G06F2209/548
摘要： Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or “sets” of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon completion of execution of the first set. A parallel or graphics processor (GPU) can autonomously execute the first set of CTAs and delay execution of the second set of CTAs until the first set of CTAs is complete. In some embodiments the GPU may determine that a third set of CTAs is not dependent upon the first set, and may launch the third set of CTAs while the second set of CTAs is delayed. In this manner, the GPU may execute launch commands out of order with respect to the order of the launch commands in the pushbuffer.
摘要翻译：由线程执行的处理工作同步的系统和方法，协同线程数组（CIA）或CTA的“集合”。中央处理单元可以加载针对第一组CTA和第二组CTA的推送命令，并且在第一组的执行完成时指定第二组的依赖关系。并行或图形处理器（GPU）可以自主地执行第一组CTA并且延迟第二组CTA的执行，直到第一组CTA完成。在一些实施例中，GPU可以确定第三组CTA不依赖于第一组，并且可以启动第三组CTA，同时第二组CTA被延迟。以这种方式，GPU可以相对于推送缓冲器中的发射命令的顺序执行命令无序。

66. 发明授权

US06959378B2 Reconfigurable processing system and method 有权
标题翻译：可重构的处理系统和方法
公开(公告)号：US06959378B2
公开(公告)日：2005-10-25
申请号：US10004246
申请日：2001-11-02
申请人： John R. Nickolls , Scott D. Johnson , Mark Williams , Ethan Mirsky , Kambdur Kirthiranjan , Amrit Raj Pant , Lawrence J. Madar, III
发明人： John R. Nickolls , Scott D. Johnson , Mark Williams , Ethan Mirsky , Kambdur Kirthiranjan , Amrit Raj Pant , Lawrence J. Madar, III
IPC分类号： G06F9/302 , G06F9/318 , G06F9/32 , G06F9/345 , G06F15/78 , G06F9/30 , G06F19/00
CPC分类号： G06F9/3001 , G06F9/30181 , G06F9/325 , G06F9/345 , G06F9/3455 , G06F15/8061
摘要： A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed. For each vector, vector address units provide addresses which stride through each element of each vector.
摘要翻译：可重构处理系统并行执行指令和配置。最初，第一条指令将配置加载到配置寄存器中。随后取出的指令的配置字段选择配置寄存器。所选配置寄存器中的配置的指令控制和控制按照指令进行解码和修改。控件向处理操作数并生成结果的执行单元提供数据操作数。可以处理标量数据，向量数据或标量和向量数据的组合。处理由与指令中的配置字段调用的配置并行执行的指令控制。使用存储向量的向量寄存器文件处理向量。向量地址单元标识要处理的向量寄存器文件中的向量元素的地址。对于每个向量，向量地址单元提供跨越每个向量的每个元素的地址。

67. 发明授权

US08700877B2 Address mapping for a parallel thread processor 有权
标题翻译：并行线程处理器的地址映射
公开(公告)号：US08700877B2
公开(公告)日：2014-04-15
申请号：US12890518
申请日：2010-09-24
申请人： Michael C. Shebanow , Yan Yan Tang , John R. Nickolls
发明人： Michael C. Shebanow , Yan Yan Tang , John R. Nickolls
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： G06F12/0284 , G06F9/3851 , G06F12/0607
摘要： A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.
摘要翻译：一种并行线程处理器中线程地址映射的方法。该方法包括接收与线程组中的第一线程相关联的线程地址; 基于线程地址在线程地址空间的本地窗口内的位置来计算有效地址; 基于有效地址和与第一线程相关联的线程标识符计算与线程组相关联的地址空间中的线程组地址; 以及基于所述线程组地址和线程组标识符计算与所述第一线程相关联的虚拟地址，其中所述虚拟地址用于访问与所述线程地址相关联的存储器中的位置以加载或存储数据。

68. 发明授权

US08271763B2 Unified addressing and instructions for accessing parallel memory spaces 有权
标题翻译：统一寻址和访问并行存储空间的指令
公开(公告)号：US08271763B2
公开(公告)日：2012-09-18
申请号：US12567637
申请日：2009-09-25
申请人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville
发明人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville
IPC分类号： G06F12/10
CPC分类号： G06F12/1054 , G06F12/0284 , G06F12/109 , G06F13/404 , G06F2212/302 , G06F2212/656
摘要： One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.
摘要翻译：本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。可以使用单一类型的加载或存储指令，其指定线程的统一存储器空间地址，而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

69. 发明授权

US08200947B1 Systems and methods for voting among parallel threads 有权
标题翻译：并行线程中投票的系统和方法
公开(公告)号：US08200947B1
公开(公告)日：2012-06-12
申请号：US12054322
申请日：2008-03-24
申请人： John R. Nickolls , Lars Nyland , Peter C. Mills , Jeremy Sugerman , Timothy Foley , Brian Fahs , Michael Garland , David P. Luebke
发明人： John R. Nickolls , Lars Nyland , Peter C. Mills , Jeremy Sugerman , Timothy Foley , Brian Fahs , Michael Garland , David P. Luebke
IPC分类号： G06F9/00
CPC分类号： G06F9/3851 , G06F9/30087 , G06F9/3009 , G06F9/3887
摘要： One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
摘要翻译：本发明的一个实施例提出了一种用于在多线程并行处理系统内有效执行投票操作的技术。一组相关的并行程序线程并行执行在处理器内核中。引入了一项称为“投票”指令的新指令，使得并行程序线程能够在相关线程组的上下文中发布个人投票并接收投票结果。以这种方式，投票指令有利地减少与线程间通信相关联的开销，从而提高整体系统性能。

70. 发明授权

US08086806B2 Systems and methods for coalescing memory accesses of parallel threads 有权
标题翻译：并行线程内存访问的系统和方法
公开(公告)号：US08086806B2
公开(公告)日：2011-12-27
申请号：US12054330
申请日：2008-03-24
申请人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal
发明人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal
IPC分类号： G06F12/00
CPC分类号： G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3891
摘要： One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data. When all the memory access requests associated with a particular PRT entry are complete, the core interface satisfies the corresponding application request and frees the PRT entry.
摘要翻译：本发明的一个实施例提出了一种用于有效且灵活地执行线程组合的存储器访问的技术。对于为线程组服务的每个读取应用程序请求，核心接口生成一个未决请求表（PRT）条目和一个或多个内存访问请求。核心接口基于应用程序请求中的存储器访问地址的扩展来确定存储器访问请求的数量和每个存储器访问请求的大小。每个存储器访问请求指定存储器访问请求服务的特定线程。 PRT条目跟踪挂起的内存访问请求的数量。当存储器接口完成每个存储器访问请求时，核心接口使用存储器访问请求中的信息和对应的PRT条目来路由返回的数据。当与特定PRT条目相关联的所有存储器访问请求完成时，核心接口满足相应的应用请求并释放PRT条目。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式