会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 21. 发明授权
    • Generating event signals for performance register control using non-operative instructions
    • 使用非操作指令生成用于性能寄存器控制的事件信号
    • US07809928B1
    • 2010-10-05
    • US11313872
    • 2005-12-20
    • Roger L. AllenBrett W. CoonIan A. BuckJohn R. Nickolls
    • Roger L. AllenBrett W. CoonIan A. BuckJohn R. Nickolls
    • G06F9/30G06F17/00G09G5/02
    • G06T1/20G06F9/30072G06F9/30076G06F11/3466G06F2201/86G06F2201/865G06F2201/88
    • One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.
    • 指令解码器的一个实施例包括:指令解析器,被配置为处理第一非操作指令并产生对应于第一非操作指令的第一事件信号;以及第一事件多路复用器,被配置为从指令接收第一事件信号 解析器,以从一个或多个事件信号中选择第一事件信号,并将第一事件信号发送到事件逻辑块。 指令解码器可以在诸如着色器单元的多线程处理单元中实现,并且当在处理单元内执行一个或多个线程时,可以跟踪第一事件信号的出现。 所得到的事件信号计数可以使设计者更好地理解在处理单元内执行的诸如着色器程序之类的程序的行为,从而有助于整体处理单元和程序设计。
    • 23. 发明授权
    • Register based queuing for texture requests
    • 基于注册排队的纹理请求
    • US07456835B2
    • 2008-11-25
    • US11339937
    • 2006-01-25
    • John Erik LindholmJohn R. NickollsSimon S. MoyBrett W. Coon
    • John Erik LindholmJohn R. NickollsSimon S. MoyBrett W. Coon
    • G06T11/40G06T15/00G06T1/00G09G5/00
    • G06T11/60G09G5/363
    • A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
    • 图形处理单元可以排队大量纹理请求,以平衡纹理请求的可变性,而不需要大的纹理请求缓冲区。 专用纹理请求缓冲区排队相对较小的纹理命令和参数。 另外,对于每个排队的纹理命令,通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。 纹理单元从纹理请求缓冲区中检索纹理命令,然后从相应的通用寄存器获取相关的纹理参数。 纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。 因为当纹理命令排队时,必须为目标寄存器分配最终纹理值,所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。
    • 24. 发明授权
    • Tracking register usage during multithreaded processing using a scoreboard having separate memory regions and storing sequential register size indicators
    • 使用具有独立存储区域的记分板和存储顺序寄存器大小指示符的多线程处理期间跟踪寄存器的使用情况
    • US07434032B1
    • 2008-10-07
    • US11301589
    • 2005-12-13
    • Brett W. CoonPeter C. MillsStuart F. ObermanMing Y. Siu
    • Brett W. CoonPeter C. MillsStuart F. ObermanMing Y. Siu
    • G06F9/30
    • G06F9/3851G06F9/3838G06F9/3879G06F9/3885
    • A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
    • 用于处理单元的记分板存储器具有分配给要处理的多个线程中的每一个的分离的存储器区域。 对于每个线程,记分板存储器存储具有待处理写入的寄存器的寄存器标识符。 当指令被添加到指令缓冲器中时,将指令中指定的寄存器的寄存器标识符与存储在该指令的线程的记分板存储器中的寄存器标识进行比较,并生成表示比较结果的多位值。 多位值与指令一起存储在指令缓冲器中,并且可以更新为属于同一线程的指令完成其执行。 在执行指令之前,将检查该多位值。 如果该多位值表示指令中没有指定的寄存器没有挂起写操作,则允许指令执行。
    • 29. 发明授权
    • Maximized memory throughput on parallel processing devices
    • 最大化并行处理设备的内存吞吐量
    • US08327123B2
    • 2012-12-04
    • US13069384
    • 2011-03-23
    • Norbert JuffaBrett W. Coon
    • Norbert JuffaBrett W. Coon
    • G06F9/30
    • G06F9/3887G06F9/3455G06F9/3851G06F9/3889
    • In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.
    • 在用于流计算的并行处理装置中,流的每个数据元素的处理可能不是计算密集的,因此与读取流并写入结果所需的存储器访问时间相比,处理可能需要相对较少的时间来计算。 因此,内存吞吐量通常会限制流计算的性能。 一般来说,提供了用于在这种存储器吞吐量限制的流计算中实现改进的,优化的或最终最大化的存储器吞吐量的方法。 通过提高跨多个处理元件和线程的聚合内存吞吐量,最大化流计算性能。 通过平衡线程和线程组之间的处理负载以及耦合到并行处理设备的硬件存储器接口来实现高聚合内存吞吐量。