专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

21. 发明授权

US07809928B1 Generating event signals for performance register control using non-operative instructions 有权
标题翻译：使用非操作指令生成用于性能寄存器控制的事件信号
公开(公告)号：US07809928B1
公开(公告)日：2010-10-05
申请号：US11313872
申请日：2005-12-20
申请人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls
发明人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls
IPC分类号： G06F9/30 , G06F17/00 , G09G5/02
CPC分类号： G06T1/20 , G06F9/30072 , G06F9/30076 , G06F11/3466 , G06F2201/86 , G06F2201/865 , G06F2201/88
摘要： One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.
摘要翻译：指令解码器的一个实施例包括：指令解析器，被配置为处理第一非操作指令并产生对应于第一非操作指令的第一事件信号;以及第一事件多路复用器，被配置为从指令接收第一事件信号解析器，以从一个或多个事件信号中选择第一事件信号，并将第一事件信号发送到事件逻辑块。指令解码器可以在诸如着色器单元的多线程处理单元中实现，并且当在处理单元内执行一个或多个线程时，可以跟踪第一事件信号的出现。所得到的事件信号计数可以使设计者更好地理解在处理单元内执行的诸如着色器程序之类的程序的行为，从而有助于整体处理单元和程序设计。

22. 发明授权

US07542043B1 Subdividing a shader program 有权
标题翻译：细分着色程序
公开(公告)号：US07542043B1
公开(公告)日：2009-06-02
申请号：US11136346
申请日：2005-05-23
申请人： John Erik Lindholm , Brett W. Coon , Gary M. Tarolli
发明人： John Erik Lindholm , Brett W. Coon , Gary M. Tarolli
IPC分类号： G06T1/20 , G06F5/80
CPC分类号： G06T1/60 , G06F8/4442 , G06F9/3834 , G06F9/3851 , G06F9/3885
摘要： Methods and apparatus for subdividing a shader program into regions or “phases” of instructions identifiable by phase identifiers (IDs) inserted into the shader program are provided. The phase IDs may be used to constrain execution of the shader program to prohibit texture fetches in later phases from being executed before a texture fetch in a current phase has completed. Other operations (e.g., math operations) within the current phase, however, may be allowed to execute while waiting for the current phase texture fetch to complete.
摘要翻译：提供了将着色器程序细分为通过插入到着色器程序中的相位标识符（ID）可识别的指令的区域或“阶段”的方法和装置。相位ID可以用于限制着色器程序的执行，以便在当前阶段的纹理提取完成之前禁止稍后阶段中的纹理提取被执行。然而，当前阶段的其他操作（例如，数学运算）可以在等待当前相位纹理提取完成的同时执行。

23. 发明授权

US07456835B2 Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求
公开(公告)号：US07456835B2
公开(公告)日：2008-11-25
申请号：US11339937
申请日：2006-01-25
申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon
发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon
IPC分类号： G06T11/40 , G06T15/00 , G06T1/00 , G09G5/00
CPC分类号： G06T11/60 , G09G5/363
摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
摘要翻译：图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

24. 发明授权

US07434032B1 Tracking register usage during multithreaded processing using a scoreboard having separate memory regions and storing sequential register size indicators 有权
标题翻译：使用具有独立存储区域的记分板和存储顺序寄存器大小指示符的多线程处理期间跟踪寄存器的使用情况
公开(公告)号：US07434032B1
公开(公告)日：2008-10-07
申请号：US11301589
申请日：2005-12-13
申请人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu
发明人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu
IPC分类号： G06F9/30
CPC分类号： G06F9/3851 , G06F9/3838 , G06F9/3879 , G06F9/3885
摘要： A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
摘要翻译：用于处理单元的记分板存储器具有分配给要处理的多个线程中的每一个的分离的存储器区域。对于每个线程，记分板存储器存储具有待处理写入的寄存器的寄存器标识符。当指令被添加到指令缓冲器中时，将指令中指定的寄存器的寄存器标识符与存储在该指令的线程的记分板存储器中的寄存器标识进行比较，并生成表示比较结果的多位值。多位值与指令一起存储在指令缓冲器中，并且可以更新为属于同一线程的指令完成其执行。在执行指令之前，将检查该多位值。如果该多位值表示指令中没有指定的寄存器没有挂起写操作，则允许指令执行。

25. 发明授权

US07366878B1 Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching 有权
标题翻译：基于具有更好缓存的数学和数据访问操作阶段的相位边界限定规则的多线程指令缓冲区的调度指令
公开(公告)号：US07366878B1
公开(公告)日：2008-04-29
申请号：US11404196
申请日：2006-04-13
申请人： Peter C. Mills , John Erik Lindholm , Brett W. Coon , Gary M. Tarolli , John Matthew Burgess
发明人： Peter C. Mills , John Erik Lindholm , Brett W. Coon , Gary M. Tarolli , John Matthew Burgess
IPC分类号： G06F9/50
CPC分类号： G06F9/3851 , G06F9/3838 , G06F9/3853 , G06F9/3867 , G06F9/3885
摘要： A processor buffers asynchronous threads. Current instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one math operation and at least one texture cache access operation. Instructions within each phase are qualified and prioritized, with texture cache access operations in a subsequent phase not qualified until all of the texture cache access operations in a current phase have completed. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the instructions. The instructions may also be qualified based on an age of each instruction, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute current instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
摘要翻译：处理器缓冲异步线程。由多个执行单元提供的需要操作的当前指令被划分为相位，每个相位具有至少一个数学运算和至少一个纹理高速缓存存取操作。每个阶段内的指令都是合格的并且是优先级排序的，后续阶段的纹理高速缓存访问操作在当前阶段的所有纹理缓存访问操作都已经完成之前不合格。可以基于执行一个或多个指令所需的执行单元的状态来限制指令。指令也可以根据每个指令的年龄，分歧潜力，局部性，线程分集和资源需求进行限定。可以根据执行当前指令所需的执行单元和正在使用的执行单元，优先考虑合格的指令。每个周期向多个执行单元发出一个或多个优先指令。

26. 发明授权

US08751771B2 Efficient implementation of arrays of structures on SIMT and SIMD architectures 有权
标题翻译：在SIMT和SIMD架构上高效地实现结构数组
公开(公告)号：US08751771B2
公开(公告)日：2014-06-10
申请号：US13247855
申请日：2011-09-28
申请人： Brian Fahs , Henry Packard Moreton , Brett W. Coon , Kathleen Elliott Nickolls
发明人： Brian Fahs , John R. Nickolls , Henry Packard Moreton , Brett W. Coon
IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F9/26 , G06F9/34 , G06F9/38 , G06F9/30
CPC分类号： G06F9/3885 , G06F9/30036 , G06F9/3009 , G06F9/30123 , G06F9/345 , G06F9/3824 , G06F9/3851 , G06F9/3887 , G06F12/0207 , G06T1/20
摘要： One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
摘要翻译：本发明的一个实施例提出了一种技术，其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。具体来说，设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。结果是存储器分配和访问方法，其中设备驱动器正确地计算存储器中的目标地址。有利的是，处理效率得到改善，其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度（每个执行组的线程或通道数）成比例的阵列结构的阵列。

27. 发明授权

US08732713B2 Thread group scheduler for computing on a parallel thread processor 有权
标题翻译：线程组调度程序，用于在并行线程处理器上进行计算
公开(公告)号：US08732713B2
公开(公告)日：2014-05-20
申请号：US13247819
申请日：2011-09-28
申请人： Brett W. Coon , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls
发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette
IPC分类号： G06F9/46
CPC分类号： G06F9/4881 , G06F2209/483
摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.
摘要翻译：并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

28. 发明授权

US08645638B2 Shared single-access memory with management of multiple parallel requests 有权
标题翻译：具有管理多个并行请求的共享单访问存储器
公开(公告)号：US08645638B2
公开(公告)日：2014-02-04
申请号：US13466057
申请日：2012-05-07
申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills
发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills
IPC分类号： G06F12/00 , G06F13/00
CPC分类号： G06F12/084 , Y02D10/13
摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
摘要翻译：多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

29. 发明授权

US08327123B2 Maximized memory throughput on parallel processing devices 有权
标题翻译：最大化并行处理设备的内存吞吐量
公开(公告)号：US08327123B2
公开(公告)日：2012-12-04
申请号：US13069384
申请日：2011-03-23
申请人： Norbert Juffa , Brett W. Coon
发明人： Norbert Juffa , Brett W. Coon
IPC分类号： G06F9/30
CPC分类号： G06F9/3887 , G06F9/3455 , G06F9/3851 , G06F9/3889
摘要： In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.
摘要翻译：在用于流计算的并行处理装置中，流的每个数据元素的处理可能不是计算密集的，因此与读取流并写入结果所需的存储器访问时间相比，处理可能需要相对较少的时间来计算。因此，内存吞吐量通常会限制流计算的性能。一般来说，提供了用于在这种存储器吞吐量限制的流计算中实现改进的，优化的或最终最大化的存储器吞吐量的方法。通过提高跨多个处理元件和线程的聚合内存吞吐量，最大化流计算性能。通过平衡线程和线程组之间的处理负载以及耦合到并行处理设备的硬件存储器接口来实现高聚合内存吞吐量。

30. 发明申请

US20120089792A1 EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES 有权
标题翻译：对SIMT和SIMD建筑结构的有效实施
公开(公告)号：US20120089792A1
公开(公告)日：2012-04-12
申请号：US13247855
申请日：2011-09-28
申请人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon
发明人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon
IPC分类号： G06F12/00
CPC分类号： G06F9/3885 , G06F9/30036 , G06F9/3009 , G06F9/30123 , G06F9/345 , G06F9/3824 , G06F9/3851 , G06F9/3887 , G06F12/0207 , G06T1/20
摘要： One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).
摘要翻译：本发明的一个实施例提出了一种技术，其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。具体来说，设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。结果是存储器分配和访问方法，其中设备驱动器正确地计算存储器中的目标地址。有利的是，处理效率得到改善，其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度（每个执行组的线程或通道数）成比例的阵列结构的阵列。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式