会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • METHODS FOR SCALABLY EXPLOITING PARALLELISM IN A PARALLEL PROCESSING SYSTEM
    • 在平行处理系统中大量开发并行的方法
    • US20110238955A1
    • 2011-09-29
    • US13099035
    • 2011-05-02
    • John R. NickollsStephen D. Lew
    • John R. NickollsStephen D. Lew
    • G06F9/30
    • G06F9/3851G06F9/30072G06F9/3012G06F9/3889G06F9/5066
    • Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
    • 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。
    • 3. 发明授权
    • Synchronization of threads in a cooperative thread array
    • 协同线程数组中的线程同步
    • US07788468B1
    • 2010-08-31
    • US11303780
    • 2005-12-15
    • John R. NickollsStephen D. LewBrett W. CoonPeter C. Mills
    • John R. NickollsStephen D. LewBrett W. CoonPeter C. Mills
    • G06F15/00G06F15/76
    • G06F9/3851G06F9/30087G06F9/3009G06F9/3834G06F9/3887G06F9/522
    • A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.
    • “协同线程数组”或“CTA”是一组多个线程,它们在输入数据集上同时执行相同的程序以产生输出数据集。 CTA中的每个线程都具有在线程启动时分配的唯一线程标识符,用于控制线程的处理行为的各个方面,例如要由每个线程处理的输入数据集的部分,要生成的输出数据集的部分 通过每个线程,和/或在线程之间共享中间结果。 CTA的不同线程有利地在CTA执行期间在适当的点处同步,其中使用屏障同步技术,其中检测到CTA程序中的障碍指令并用于暂停某些线程的执行,直到指定数量的其他线程也到达屏障点。
    • 4. 发明授权
    • Methods for scalably exploiting parallelism in a parallel processing system
    • 在并行处理系统中可扩展地利用并行性的方法
    • US08099584B2
    • 2012-01-17
    • US13099035
    • 2011-05-02
    • John R. NickollsStephen D. Lew
    • John R. NickollsStephen D. Lew
    • G06F9/30
    • G06F9/3851G06F9/30072G06F9/3012G06F9/3889G06F9/5066
    • Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
    • 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。
    • 5. 发明授权
    • Methods for scalably exploiting parallelism in a parallel processing system
    • 在并行处理系统中可扩展地利用并行性的方法
    • US07937567B1
    • 2011-05-03
    • US11555623
    • 2006-11-01
    • John R. NickollsStephen D. Lew
    • John R. NickollsStephen D. Lew
    • G06F9/30
    • G06F9/3851G06F9/30072G06F9/3012G06F9/3889G06F9/5066
    • Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
    • 并行处理子系统中的并行性以可扩展的方式被利用。 要解决的问题可以被分层分解成至少两个级别的子问题。 定义程序执行的各个线程来解决最低级别的问题。 线程被分组成一个或多个线程数组,每个线程数组都解决了较高级的子问题。 线程数组可以通过处理内核执行,每个核心可以一次执行至少一个线程数组。 线程数组可以分组成独立线程数组的网格,从而解决更高级的子问题或整个问题。 网格中的线程数组或整个网格可以分布在所有可用处理核心中,如特定系统实现中可用的。
    • 6. 发明申请
    • PARALLEL DATA PROCESSING SYSTEMS AND METHODS USING COOPERATIVE THREAD ARRAYS
    • 并行数据处理系统和使用合作螺纹阵列的方法
    • US20110087860A1
    • 2011-04-14
    • US12972361
    • 2010-12-17
    • John R. NickollsStephen D. Lew
    • John R. NickollsStephen D. Lew
    • G06F15/16
    • G06F9/544G06F9/3851G06F9/3887G06F9/522
    • Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.
    • 并行数据处理系统和方法使用协同线程数组(CIA),即在输入数据集上同时执行相同程序的多线程组,以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符(线程ID),可以在线程启动时分配。 线程ID控制线程的处理行为的各个方面,例如由每个线程处理的输入数据集的部分,由每个线程产生的输出数据集的部分和/或线程之间的中间结果的共享 。 还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。
    • 8. 发明授权
    • Counter-based delay of dependent thread group execution
    • 依赖线程组执行的基于计数器的延迟
    • US07526634B1
    • 2009-04-28
    • US11535871
    • 2006-09-27
    • Jerome F. Duluk, Jr.Stephen D. LewJohn R. Nickolls
    • Jerome F. Duluk, Jr.Stephen D. LewJohn R. Nickolls
    • G06F9/40
    • G06F9/52G06F9/546G06F2209/548
    • Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or “sets” of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon completion of execution of the first set. A parallel or graphics processor (GPU) can autonomously execute the first set of CTAs and delay execution of the second set of CTAs until the first set of CTAs is complete. In some embodiments the GPU may determine that a third set of CTAs is not dependent upon the first set, and may launch the third set of CTAs while the second set of CTAs is delayed. In this manner, the GPU may execute launch commands out of order with respect to the order of the launch commands in the pushbuffer.
    • 由线程执行的处理工作同步的系统和方法,协同线程数组(CIA)或CTA的“集合”。 中央处理单元可以加载针对第一组CTA和第二组CTA的推送命令,并且在第一组的执行完成时指定第二组的依赖关系。 并行或图形处理器(GPU)可以自主地执行第一组CTA并且延迟第二组CTA的执行,直到第一组CTA完成。 在一些实施例中,GPU可以确定第三组CTA不依赖于第一组,并且可以启动第三组CTA,同时第二组CTA被延迟。 以这种方式,GPU可以相对于推送缓冲器中的发射命令的顺序执行命令无序。
    • 9. 发明申请
    • DIGITAL MEDIA PROCESSOR
    • 数字媒体处理器
    • US20140055559A1
    • 2014-02-27
    • US13568875
    • 2012-08-07
    • Jen-Hsun HuangGerrit A. SlavenburgStephen D. LewJohn C. SchaferThomas F. FoxTaner E. Ozcelik
    • Jen-Hsun HuangGerrit A. SlavenburgStephen D. LewJohn C. SchaferThomas F. FoxTaner E. Ozcelik
    • G06T15/00H04N13/00
    • G06T15/005G06F15/78G09G5/003G09G2360/02H04N13/161
    • Circuits, methods, and apparatus that provide highly integrated digital media processors for digital consumer electronics applications. These digital media processors are capable of performing the parallel processing of multiple format audio, video, and graphics signals. In one embodiment, audio and video signals may be received from a variety of input devices or appliances, such as antennas, VCRs, DVDs, and networked devices such as camcorders and modems, while output audio and video signals may be provided to output devices such as televisions, monitors, and networked devices such as printers and networked video recorders. Another embodiment of the present invention interfaces with a variety of devices such as navigation, entertainment, safety, memory, and networking devices. This embodiment can also be configured for use in a digital TV, set-top box, or home server. In this configuration, video and audio streams may be received from a number of cable, satellite, Internet, and consumer devices.
    • 为数字消费电子应用提供高度集成的数字媒体处理器的电路,方法和设备。 这些数字媒体处理器能够执行多格式音频,视频和图形信号的并行处理。 在一个实施例中,音频和视频信号可以从诸如天线,VCR,DVD以及诸如摄像机和调制解调器之类的网络设备的各种输入设备或设备接收,而输出音频和视频信号可以被提供给诸如 作为电视机,显示器和网络设备,如打印机和网络录像机。 本发明的另一实施例与诸如导航,娱乐,安全,存储器和网络设备的各种设备接口。 该实施例还可以被配置为用于数字电视,机顶盒或家庭服务器中。 在该配置中,可以从多个有线,卫星,因特网和消费者设备接收视频和音频流。