专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20110238955A1 METHODS FOR SCALABLY EXPLOITING PARALLELISM IN A PARALLEL PROCESSING SYSTEM 有权
标题翻译：在平行处理系统中大量开发并行的方法
公开(公告)号：US20110238955A1
公开(公告)日：2011-09-29
申请号：US13099035
申请日：2011-05-02
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F9/30
CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3012 , G06F9/3889 , G06F9/5066
摘要： Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
摘要翻译：并行处理子系统中的并行性以可扩展的方式被利用。要解决的问题可以被分层分解成至少两个级别的子问题。定义程序执行的各个线程来解决最低级别的问题。线程被分组成一个或多个线程数组，每个线程数组都解决了较高级的子问题。线程数组可以通过处理内核执行，每个核心可以一次执行至少一个线程数组。线程数组可以分组成独立线程数组的网格，从而解决更高级的子问题或整个问题。网格中的线程数组或整个网格可以分布在所有可用处理核心中，如特定系统实现中可用的。

2. 发明授权

US07861060B1 Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior 有权
标题翻译：并行数据处理系统和方法使用协作线程数组和线程标识符值来确定处理行为
公开(公告)号：US07861060B1
公开(公告)日：2010-12-28
申请号：US11305178
申请日：2005-12-15
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F15/16
CPC分类号： G06F9/544 , G06F9/3851 , G06F9/3887 , G06F9/522
摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.
摘要翻译：并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配。线程ID控制线程的处理行为的各个方面，例如由每个线程处理的输入数据集的部分，由每个线程产生的输出数据集的部分和/或线程之间的中间结果的共享。还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

3. 发明授权

US07788468B1 Synchronization of threads in a cooperative thread array 有权
标题翻译：协同线程数组中的线程同步
公开(公告)号：US07788468B1
公开(公告)日：2010-08-31
申请号：US11303780
申请日：2005-12-15
申请人： John R. Nickolls , Stephen D. Lew , Brett W. Coon , Peter C. Mills
发明人： John R. Nickolls , Stephen D. Lew , Brett W. Coon , Peter C. Mills
IPC分类号： G06F15/00 , G06F15/76
CPC分类号： G06F9/3851 , G06F9/30087 , G06F9/3009 , G06F9/3834 , G06F9/3887 , G06F9/522
摘要： A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.
摘要翻译： “协同线程数组”或“CTA”是一组多个线程，它们在输入数据集上同时执行相同的程序以产生输出数据集。 CTA中的每个线程都具有在线程启动时分配的唯一线程标识符，用于控制线程的处理行为的各个方面，例如要由每个线程处理的输入数据集的部分，要生成的输出数据集的部分通过每个线程，和/或在线程之间共享中间结果。 CTA的不同线程有利地在CTA执行期间在适当的点处同步，其中使用屏障同步技术，其中检测到CTA程序中的障碍指令并用于暂停某些线程的执行，直到指定数量的其他线程也到达屏障点。

4. 发明授权

US08099584B2 Methods for scalably exploiting parallelism in a parallel processing system 有权
标题翻译：在并行处理系统中可扩展地利用并行性的方法
公开(公告)号：US08099584B2
公开(公告)日：2012-01-17
申请号：US13099035
申请日：2011-05-02
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F9/30
CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3012 , G06F9/3889 , G06F9/5066
摘要： Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
摘要翻译：并行处理子系统中的并行性以可扩展的方式被利用。要解决的问题可以被分层分解成至少两个级别的子问题。定义程序执行的各个线程来解决最低级别的问题。线程被分组成一个或多个线程数组，每个线程数组都解决了较高级的子问题。线程数组可以通过处理内核执行，每个核心可以一次执行至少一个线程数组。线程数组可以分组成独立线程数组的网格，从而解决更高级的子问题或整个问题。网格中的线程数组或整个网格可以分布在所有可用处理核心中，如特定系统实现中可用的。

5. 发明授权

US07937567B1 Methods for scalably exploiting parallelism in a parallel processing system 有权
标题翻译：在并行处理系统中可扩展地利用并行性的方法
公开(公告)号：US07937567B1
公开(公告)日：2011-05-03
申请号：US11555623
申请日：2006-11-01
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F9/30
CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3012 , G06F9/3889 , G06F9/5066
摘要： Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
摘要翻译：并行处理子系统中的并行性以可扩展的方式被利用。要解决的问题可以被分层分解成至少两个级别的子问题。定义程序执行的各个线程来解决最低级别的问题。线程被分组成一个或多个线程数组，每个线程数组都解决了较高级的子问题。线程数组可以通过处理内核执行，每个核心可以一次执行至少一个线程数组。线程数组可以分组成独立线程数组的网格，从而解决更高级的子问题或整个问题。网格中的线程数组或整个网格可以分布在所有可用处理核心中，如特定系统实现中可用的。

6. 发明申请

US20110087860A1 PARALLEL DATA PROCESSING SYSTEMS AND METHODS USING COOPERATIVE THREAD ARRAYS 有权
标题翻译：并行数据处理系统和使用合作螺纹阵列的方法
公开(公告)号：US20110087860A1
公开(公告)日：2011-04-14
申请号：US12972361
申请日：2010-12-17
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F15/16
CPC分类号： G06F9/544 , G06F9/3851 , G06F9/3887 , G06F9/522
摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.
摘要翻译：并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配。线程ID控制线程的处理行为的各个方面，例如由每个线程处理的输入数据集的部分，由每个线程产生的输出数据集的部分和/或线程之间的中间结果的共享。还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

7. 发明授权

US08112614B2 Parallel data processing systems and methods using cooperative thread arrays with unique thread identifiers as an input to compute an identifier of a location in a shared memory 有权
标题翻译：使用具有唯一线程标识符的协作线程数组作为输入的并行数据处理系统和方法来计算共享存储器中位置的标识符
公开(公告)号：US08112614B2
公开(公告)日：2012-02-07
申请号：US12972361
申请日：2010-12-17
申请人： John R. Nickolls , Stephen D. Lew
发明人： John R. Nickolls , Stephen D. Lew
IPC分类号： G06F15/16
CPC分类号： G06F9/544 , G06F9/3851 , G06F9/3887 , G06F9/522
摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.
摘要翻译：并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配。线程ID控制线程的处理行为的各个方面，例如由每个线程处理的输入数据集的部分，由每个线程生成的输出数据集的部分和/或线程之间的中间结果的共享。还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

8. 发明授权

US07526634B1 Counter-based delay of dependent thread group execution 有权
标题翻译：依赖线程组执行的基于计数器的延迟
公开(公告)号：US07526634B1
公开(公告)日：2009-04-28
申请号：US11535871
申请日：2006-09-27
申请人： Jerome F. Duluk, Jr. , Stephen D. Lew , John R. Nickolls
发明人： Jerome F. Duluk, Jr. , Stephen D. Lew , John R. Nickolls
IPC分类号： G06F9/40
CPC分类号： G06F9/52 , G06F9/546 , G06F2209/548
摘要： Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or “sets” of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon completion of execution of the first set. A parallel or graphics processor (GPU) can autonomously execute the first set of CTAs and delay execution of the second set of CTAs until the first set of CTAs is complete. In some embodiments the GPU may determine that a third set of CTAs is not dependent upon the first set, and may launch the third set of CTAs while the second set of CTAs is delayed. In this manner, the GPU may execute launch commands out of order with respect to the order of the launch commands in the pushbuffer.
摘要翻译：由线程执行的处理工作同步的系统和方法，协同线程数组（CIA）或CTA的“集合”。中央处理单元可以加载针对第一组CTA和第二组CTA的推送命令，并且在第一组的执行完成时指定第二组的依赖关系。并行或图形处理器（GPU）可以自主地执行第一组CTA并且延迟第二组CTA的执行，直到第一组CTA完成。在一些实施例中，GPU可以确定第三组CTA不依赖于第一组，并且可以启动第三组CTA，同时第二组CTA被延迟。以这种方式，GPU可以相对于推送缓冲器中的发射命令的顺序执行命令无序。

9. 发明申请

US20140055559A1 DIGITAL MEDIA PROCESSOR 审中-公开
标题翻译：数字媒体处理器
公开(公告)号：US20140055559A1
公开(公告)日：2014-02-27
申请号：US13568875
申请日：2012-08-07
申请人： Jen-Hsun Huang , Gerrit A. Slavenburg , Stephen D. Lew , John C. Schafer , Thomas F. Fox , Taner E. Ozcelik
发明人： Jen-Hsun Huang , Gerrit A. Slavenburg , Stephen D. Lew , John C. Schafer , Thomas F. Fox , Taner E. Ozcelik
IPC分类号： G06T15/00 , H04N13/00
CPC分类号： G06T15/005 , G06F15/78 , G09G5/003 , G09G2360/02 , H04N13/161
摘要： Circuits, methods, and apparatus that provide highly integrated digital media processors for digital consumer electronics applications. These digital media processors are capable of performing the parallel processing of multiple format audio, video, and graphics signals. In one embodiment, audio and video signals may be received from a variety of input devices or appliances, such as antennas, VCRs, DVDs, and networked devices such as camcorders and modems, while output audio and video signals may be provided to output devices such as televisions, monitors, and networked devices such as printers and networked video recorders. Another embodiment of the present invention interfaces with a variety of devices such as navigation, entertainment, safety, memory, and networking devices. This embodiment can also be configured for use in a digital TV, set-top box, or home server. In this configuration, video and audio streams may be received from a number of cable, satellite, Internet, and consumer devices.
摘要翻译：为数字消费电子应用提供高度集成的数字媒体处理器的电路，方法和设备。这些数字媒体处理器能够执行多格式音频，视频和图形信号的并行处理。在一个实施例中，音频和视频信号可以从诸如天线，VCR，DVD以及诸如摄像机和调制解调器之类的网络设备的各种输入设备或设备接收，而输出音频和视频信号可以被提供给诸如作为电视机，显示器和网络设备，如打印机和网络录像机。本发明的另一实施例与诸如导航，娱乐，安全，存储器和网络设备的各种设备接口。该实施例还可以被配置为用于数字电视，机顶盒或家庭服务器中。在该配置中，可以从多个有线，卫星，因特网和消费者设备接收视频和音频流。

10. 发明授权

US08203562B1 Apparatus, system, and method for distributing work to integrated heterogeneous processors 有权
标题翻译：用于将工作分配到集成异构处理器的装置，系统和方法
公开(公告)号：US08203562B1
公开(公告)日：2012-06-19
申请号：US12243766
申请日：2008-10-01
申请人： Jonah M. Alben , Stephen D. Lew , Paolo E. Sabella
发明人： Jonah M. Alben , Stephen D. Lew , Paolo E. Sabella
IPC分类号： G06T15/00 , G06F15/16
CPC分类号： G06F15/16 , G06F9/5044 , G06T1/20 , G06T15/005 , G09G5/363 , G09G2330/021 , G09G2360/06 , Y02D10/22
摘要： An integrated circuit includes at least two different types of processors, such as a graphics processor and a video processor. At least one operation is commonly by supported by two different types of processors. For each commonly supported operation that is scheduled, a decision is made to determine which type of processor will be selected to implement the operation.
摘要翻译：集成电路包括至少两种不同类型的处理器，诸如图形处理器和视频处理器。通常由两种不同类型的处理器支持至少一种操作。对于被调度的每个通常支持的操作，作出决定以确定将选择哪种类型的处理器来实现该操作。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式