专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08522000B2 Trap handler architecture for a parallel processing unit 有权
标题翻译：并行处理单元的陷阱处理器架构
公开(公告)号：US08522000B2
公开(公告)日：2013-08-27
申请号：US12569831
申请日：2009-09-29
申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
IPC分类号： G06F9/00
CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861
摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.
摘要翻译：陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

2. 发明申请

US20110078427A1 TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT 有权
标题翻译：并行处理单元的TRAP操作架构
公开(公告)号：US20110078427A1
公开(公告)日：2011-03-31
申请号：US12569831
申请日：2009-09-29
申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang
IPC分类号： G06F9/38
CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861
摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.
摘要翻译：陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

3. 发明授权

US09223578B2 Coalescing memory barrier operations across multiple parallel threads 有权
标题翻译：在多个并行线程之间合并记忆障碍操作
公开(公告)号：US09223578B2
公开(公告)日：2015-12-29
申请号：US12887081
申请日：2010-09-21
申请人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
发明人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F9/46 , G06F9/38 , G06F9/30
CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851
摘要： One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
摘要翻译：本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。此外，存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。例如，第一类型的存储器障碍指令可以将存储器事务提交到共享L1（一级）高速缓存的一组协作线程的级别。第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。最后，第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

4. 发明申请

US20110072213A1 INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY 有权
标题翻译：用于管理并行缓存高速缓存的指令
公开(公告)号：US20110072213A1
公开(公告)日：2011-03-24
申请号：US12888409
申请日：2010-09-22
申请人： John R. NICKOLLS , Brett W. Coon , Michael C. Shebanow
发明人： John R. NICKOLLS , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F12/08 , G06F12/00
CPC分类号： G06F9/3887 , G06F9/30043 , G06F9/3009 , G06F9/3836 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/121 , G06F2212/452
摘要： A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
摘要翻译：一种用于在处理单元中管理并行高速缓存层级的方法。该方法包括从调度器单元接收指令，其中指令包括加载指令或存储指令; 确定所述指令包括高速缓存操作修饰符，所述缓存操作修饰符标识用于缓存与所述并行高速缓存层级的一个或多个级别上的所述指令相关联的数据的并且基于高速缓存操作修饰符执行指令并缓存与指令相关联的数据。

5. 发明申请

US20110078692A1 COALESCING MEMORY BARRIER OPERATIONS ACROSS MULTIPLE PARALLEL THREADS 有权
标题翻译：通过多个并行线程来解决存储器障碍操作
公开(公告)号：US20110078692A1
公开(公告)日：2011-03-31
申请号：US12887081
申请日：2010-09-21
申请人： John R. NICKOLLS , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
发明人： John R. NICKOLLS , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F9/46
CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851
摘要： One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
摘要翻译：本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。此外，存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。例如，第一类型的存储器障碍指令可以将存储器事务提交到共享L1（一级）高速缓存的一组协作线程的级别。第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。最后，第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

6. 发明授权

US09639479B2 Instructions for managing a parallel cache hierarchy 有权
公开(公告)号：US09639479B2
公开(公告)日：2017-05-02
申请号：US12888409
申请日：2010-09-22
申请人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow
发明人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow
IPC分类号： G06F12/121 , G06F12/0811 , G06F12/0862 , G06F9/30
CPC分类号： G06F9/3887 , G06F9/30043 , G06F9/3009 , G06F9/3836 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/121 , G06F2212/452
摘要： A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

7. 发明授权

US08533435B2 Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
标题翻译：对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突
公开(公告)号：US08533435B2
公开(公告)日：2013-09-10
申请号：US12875843
申请日：2010-09-03
申请人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman
发明人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman
IPC分类号： G06F9/34
CPC分类号： G06F9/3012 , G06F9/30098 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3887 , G06F9/3889
摘要： One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
摘要翻译：本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。由于接收到指令序列，指令指定的操作数被分配给端口，以便将由单个指令指定的每个操作数分配给不同的端口。通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数，以产生操作数读取请求，并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中银行寄存器文件。由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集，执行每条指令。

8. 发明申请

US20110072243A1 Unified Collector Structure for Multi-Bank Register File 有权
标题翻译：多银行登记册统一采集器结构
公开(公告)号：US20110072243A1
公开(公告)日：2011-03-24
申请号：US12875843
申请日：2010-09-03
申请人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman
发明人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman
IPC分类号： G06F9/30
CPC分类号： G06F9/3012 , G06F9/30098 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3887 , G06F9/3889
摘要： One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
摘要翻译：本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。由于接收到指令序列，指令指定的操作数被分配给端口，以便将由单个指令指定的每个操作数分配给不同的端口。通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数，以产生操作数读取请求，并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中银行寄存器文件。由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集，执行每条指令。

9. 发明授权

US08700877B2 Address mapping for a parallel thread processor 有权
标题翻译：并行线程处理器的地址映射
公开(公告)号：US08700877B2
公开(公告)日：2014-04-15
申请号：US12890518
申请日：2010-09-24
申请人： Michael C. Shebanow , Yan Yan Tang , John R. Nickolls
发明人： Michael C. Shebanow , Yan Yan Tang , John R. Nickolls
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28
CPC分类号： G06F12/0284 , G06F9/3851 , G06F12/0607
摘要： A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.
摘要翻译：一种并行线程处理器中线程地址映射的方法。该方法包括接收与线程组中的第一线程相关联的线程地址; 基于线程地址在线程地址空间的本地窗口内的位置来计算有效地址; 基于有效地址和与第一线程相关联的线程标识符计算与线程组相关联的地址空间中的线程组地址; 以及基于所述线程组地址和线程组标识符计算与所述第一线程相关联的虚拟地址，其中所述虚拟地址用于访问与所述线程地址相关联的存储器中的位置以加载或存储数据。

10. 发明授权

US07627723B1 Atomic memory operators in a parallel processor 有权
标题翻译：并行处理器中的原子存储器操作符
公开(公告)号：US07627723B1
公开(公告)日：2009-12-01
申请号：US11533896
申请日：2006-09-21
申请人： Ian A. Buck , John R. Nickolls , Michael C. Shebanow , Lars S. Nyland
发明人： Ian A. Buck , John R. Nickolls , Michael C. Shebanow , Lars S. Nyland
IPC分类号： G06F13/00 , G06F13/28
CPC分类号： G06F13/4022 , G06F9/3001 , G06F9/30018 , G06F9/30021 , G06F9/3004 , G06F9/30087 , G06F9/3824 , G06F9/3834 , G06F9/3851 , G06F9/3887 , G06F9/526 , G06F2209/521 , G06T1/20 , G09G5/363 , G09G5/393
摘要： Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.
摘要翻译：呈现用于在执行多个指令线程的同时更新存储器中的数据的方法，装置和系统，包括从多个并发执行的指令线程中的一个接收单个指令，响应于接收的单个指令，从特定的指令读取数据存储器位置，执行涉及从存储器位置读取的数据以产生结果的操作，以及将结果存储到特定存储器位置，而不需要单独的加载和存储指令，并且响应于接收的单个指令，排除另一个在从特定存储器位置读取数据的同时改变在特定存储器位置处的数据的多条指令线程，执行涉及数据的操作，以及将结果存储到特定存储器位置。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式