会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 41. 发明授权
    • Pipelined computer with operand context queue to simplify
context-dependent execution flow
    • 具有操作数上下文队列的流水线计算机,以简化与上下文相关的执行流程
    • US5542058A
    • 1996-07-30
    • US317427
    • 1994-10-04
    • John E. Brown, IIIG. Michael UhlerJohn H. EdmondsonDebra Bernstein
    • John E. Brown, IIIG. Michael UhlerJohn H. EdmondsonDebra Bernstein
    • F02B75/02G06F9/38G06F9/30
    • G06F9/3824G06F9/383F02B2075/025
    • A macropipelined microprocessor chip adheres to strict read and write ordering by sequentially buffering operands in queues during instruction decode, then removing the operands in order during instruction execution. Any instruction that requires additional access to memory inserts the requests into the queued sequence (in a specifier queue) such that read and write ordering is preserved. A specifier queue synchronization counter captures synchronization points to coordinate memory request operations among the autonomous instruction decode unit, instruction execution unit, and memory sub-system. The synchronization method does not restrict the benefit of overlapped execution in the pipelined. Another feature is treatment of a variable bit field operand type that does not restrict the location of operand data. Instruction execution flows in a pipelined processor having such an operand type are vastly different depending on whether operand data resides in registers or memory. Thus, an operand context queue (field queue) is used to simplify context-dependent execution flow and increase overlap. The field queue allows the instruction decode unit to issue instructions with variable bit field operands normally, sequentially identifying and fetching operands, and communicating the operand context that specifies register or memory residence across the pipeline boundaries to the autonomous execution unit. The mechanism creates opportunity for increasing the overlap of pipelined functions and greatly simplifies the splitting of execution flows.
    • 宏指令微处理器芯片通过在指令解码期间依次缓冲队列中的操作数,然后在指令执行期间依次移除操作数,从而遵循严格的读写顺序。 任何需要对内存进行访问的指令将请求插入排队的序列(在指定符队列中),以便保留读写顺序。 指定符队列同步计数器捕获同步点以协调自主指令解码单元,指令执行单元和存储器子系统之间的存储器请求操作。 同步方法不限制流水线重叠执行的好处。 另一个特征是处理不限制操作数数据位置的可变位字段操作数类型。 具有这种操作数类型的流水线处理器中的指令执行流程根据操作数数据位于寄存器或存储器中而大不相同。 因此,操作数上下文队列(字段队列)用于简化上下文相关的执行流程并增加重叠。 字段队列允许指令解码单元通常发送具有可变位字段操作数的指令,顺序地识别和取出操作数,以及将指定流水线边界的寄存器或存储器驻留的操作数上下文传送到自主执行单元。 该机制为增加流水线功能的重叠创造了机会,并大大简化了执行流程的拆分。
    • 42. 发明授权
    • Combined write-operand queue and read-after-write dependency scoreboard
    • 组合写操作数队列和读写依赖记分板
    • US5471591A
    • 1995-11-28
    • US969126
    • 1992-10-30
    • John H. EdmondsonLarry L. Biro
    • John H. EdmondsonLarry L. Biro
    • F02B75/02G06F9/38G06F12/08G06F9/312
    • G06F12/0804G06F12/0811G06F12/0831G06F9/3836G06F9/3838G06F9/3857F02B2075/025
    • In a pipelined digital computer, an instruction decoder decodes register specifiers from multiple instructions, and stores them in a source queue and a destination queue. An execution unit successively obtains source specifiers of an instruction from the source queue, initiates an operation upon the source specifiers, reads a destination specifier from the destination queue, and retires the result at the specified destination. Read-after-write conflicts may occur because the execution unit may overlap execution of a plurality of instructions. Just prior to beginning execution of a current instruction, the destination queue is checked for conflict between the source specifiers of the current instruction and the destination specifiers of previously issued but not yet retired instructions. When an instruction is issued for execution, its destination specifiers in the destination queue are marked to indicate that they are associated with an executed but not yet retired instruction. In a preferred construction, each entry of the queue has a "write pending" bit that is cleared during a flush and when a read pointer is incremented. An issue pointer identifies the entry of an instruction next to be issued, so that the write-pending bit is set when the issue pointer is incremented. Each entry has two comparators enabled by the write-pending bit to detect a conflict with two source specifiers.
    • 在流水线数字计算机中,指令解码器从多个指令解码寄存器说明符,并将它们存储在源队列和目的地队列中。 执行单元从源队列连续获得指令的源说明符,发起对源说明符的操作,从目的地队列读取目的地说明符,并在指定的目的地退出结果。 可能发生写后冲突,因为执行单元可能与多个指令的执行重叠。 在开始执行当前指令之前,检查目的地队列在当前指令的源说明符与先前发布但尚未退出的指令的目标说明符之间的冲突。 当执行指令时,目标队列中的目标说明符被标记为指示它们与被执行但尚未退出的指令相关联。 在优选的结构中,队列的每个条目具有在刷新期间和读取指针递增时被清除的“写入挂起”位。 一个问题指针标识下一个要发出的指令的条目,以便当发出指针递增时,写入挂起位置1。 每个条目都有两个比较器由写入挂起位使能,以检测与两个源说明符的冲突。
    • 45. 发明授权
    • Supporting late DRAM bank hits
    • 支持晚期DRAM银行点击
    • US08375163B1
    • 2013-02-12
    • US12326060
    • 2008-12-01
    • John H. EdmondsonShane Keil
    • John H. EdmondsonShane Keil
    • G06F12/00G06F13/00G06F13/28
    • G06F13/28
    • One embodiment of the invention sets forth a mechanism to transmit commands received from an L2 cache to a bank page within the DRAM. An arbiter unit determines which commands from a command sorter to transmit to a command queue. An activate command associated with the bank page related to the commands is also transmitted to an activate queue. The last command in the command queue is marked as “last.” An interlock counter stores a count of “last” commands in the read/write command queue. A DRAM controller transmits activate and commands from the activate queue and the command queue to the DRAM. Each time a command marked as “last” is encountered, the DRAM controller decrements the interlock counter. If the count in the interlock counter is zero, then the command marked as “last” is marked as “auto-precharge.” The “auto-precharge” command, when processed, causes the bank page to be closed.
    • 本发明的一个实施例提出了一种将从L2高速缓存接收的命令发送到DRAM内的存储体页面的机制。 仲裁器单元确定哪些命令从命令分拣机发送到命令队列。 与该命令相关联的存储体页面的激活命令也被发送到激活队列。 命令队列中的最后一个命令被标记为last。 互锁计数器存储读/写命令队列中最后命令的计数。 DRAM控制器将激活和命令从激活队列和命令队列传送到DRAM。 每次遇到标记为最后的命令时,DRAM控制器递减联锁计数器。 如果联锁计数器中的计数为零,则标记为最后的命令被标记为自动预充电。 自动预充电命令在处理时会导致银行页面被关闭。
    • 46. 发明授权
    • Managing conflicts on shared L2 bus
    • 管理共享L2总线上的冲突
    • US08321618B1
    • 2012-11-27
    • US12510987
    • 2009-07-28
    • Shane KeilJohn H. Edmondson
    • Shane KeilJohn H. Edmondson
    • G06F13/00
    • G06F13/1605G06F12/0859
    • One embodiment of the present invention sets forth a mechanism to schedule read data transmissions and write data transmissions to/from a cache to frame buffer logic on the L2 bus. When processing a read or a write command, a scheduling arbiter examines a bus schedule to determine that a read-read conflict, a read-write conflict or a write-read exists, and allocates an available memory space in a read buffer to store the read data causing the conflict until the read return data transmission can be scheduled. In the case of a write command, the scheduling arbiter then transmits a write request to a request buffer. When processing a write request, the request arbiter examines the request buffers to determine whether a write-write conflict. If so, then the request arbiter allocates a memory space in a request buffer to store the write request until the write data transmission can be scheduled.
    • 本发明的一个实施例提出了一种机制,用于在L2总线上调度读取数据传输和向高速缓存写入数据传输到帧缓冲器逻辑。 在处理读取或写入命令时,调度仲裁器检查总线调度以确定存在读取冲突,读取冲突或写入读取,并且在读取缓冲器中分配可用的存储器空间以存储 读取导致冲突的数据,直到可以调度读取返回数据传输。 在写命令的情况下,调度仲裁器然后向请求缓冲器发送写请求。 在处理写入请求时,请求仲裁器检查请求缓冲区以确定是否写入写入冲突。 如果是,则请求仲裁器在请求缓冲器中分配存储空间以存储写入请求,直到可以调度写入数据传输。
    • 48. 发明授权
    • Systems for efficient retrieval from tiled memory surface to linear memory display
    • 用于从平铺记忆表面到线性记忆体显示的高效检索系统
    • US07986327B1
    • 2011-07-26
    • US11552082
    • 2006-10-23
    • John H. Edmondson
    • John H. Edmondson
    • G06F12/10G06F13/00G06F13/28G06F9/26G06F9/34
    • G09G5/395G09G5/363G09G2350/00G09G2360/122
    • Embodiments of the present invention set forth a technique for optimizing the on-chip data path between a memory controller and a display controller within a graphics processing unit (GPU). A row selection field and a sector mask are included within a memory access command transmitted from the display controller to the memory controller indicating which row of data is being requested from memory. The memory controller responds to the memory access command by returning only the row of data corresponding to the requested row to the display controller over the on-chip data path. Any extraneous data received by the memory controller in the process of accessing the specifically requested row of data is stripped out and not transmitted back to the display controller. One advantage of the present invention is that the width of the on-chip data path can be reduced by a factor of two or more as a result of the greater operational efficiency gained by stripping out extraneous data before transmitting the data to the display controller.
    • 本发明的实施例提出了一种用于优化图形处理单元(GPU)内的存储器控​​制器和显示控制器之间的片上数据路径的技术。 从显示控制器向存储器控制器发送的指示从存储器请求哪一行数据的存储器访问命令中包括行选择字段和扇区掩码。 存储器控制器通过仅通过片上数据路径仅将与所请求的行相对应的数据行返回到显示控制器来响应存储器访问命令。 在访问特定请求的数据行的过程中由存储器控制器接收的任何无关数据被剥离并且不被传送回显示控制器。 本发明的一个优点在于,由于在将数据发送到显示控制器之前剥离外来数据而获得更大的操作效率,片上数据路径的宽度可以减少2倍或更多。
    • 49. 发明申请
    • Page stream sorter for poor locality access patterns
    • 页面流排序器用于不良的局部访问模式
    • US20080109613A1
    • 2008-05-08
    • US11592540
    • 2006-11-03
    • David A. JaroshSonny S. YeohColyn S. CaseJohn H. Edmondson
    • David A. JaroshSonny S. YeohColyn S. CaseJohn H. Edmondson
    • G06F12/00G06F17/00
    • G06F13/1626
    • In some applications, such as video motion compression processing for example, a request pattern or “stream” of requests for accesses to memory (e.g., DRAM) may have, over a large number of requests, a relatively small number of requests to the same page. Due to the small number of requests to the same page, conventionally sorting to aggregate page hits may not be very effective. Reordering the stream can be used to “bury” or “hide” much of the necessary precharge/activate time, which can have a highly positive impact on overall throughput. For example, separating accesses to different rows of the same bank by at least a predetermined number of clocks can effectively hide the overhead involved in precharging/activating the rows.
    • 在一些应用中,例如视频运动压缩处理,例如,对存储器(例如,DRAM)访问的请求的请求模式或“流”可以在大量请求中具有相对较少数量的请求 页。 由于对同一页面的请求数量不多,常规排序以汇总页面命中可能不是很有效。 重新排序流可以用于“埋葬”或“隐藏”大量必要的预充/激活时间,这对整体吞吐量可能产生很大的积极影响。 例如,将对相同存储体的不同行的访问分离至少预定数量的时钟可以有效地隐藏预充电/激活行所涉及的开销。
    • 50. 发明授权
    • Fast area-efficient multi-bit binary adder with low fan-out signals
    • 具有低扇出信号的快速区域效率的多位二进制加法器
    • US5278783A
    • 1994-01-11
    • US969124
    • 1992-10-30
    • John H. Edmondson
    • John H. Edmondson
    • G06F7/50G06F7/508
    • G06F7/508G06F2207/5063
    • A carry look-ahead adder obtains high speed with minimum gate fan-in and a regular array of area-efficient logic cells in a datapath by including a first row of propagate-generate bit cells, a second row of block-propagate bit cells generating a hierarchy of block-propagate and block-generate bits, a third row of carry bit cells: and a bottom level of sum bit cells. The second row of block-propagate bit cells supply the block-propagate and block-generate bits to the first carry bit cells in chained segments of carry bit cells. In a preferred embodiment for a 32-bit complementary metal-oxide semiconductor (CMOS) adder, the logic gates are limited to a fan-in of three, and the block-propagate bit cells in the second row are interconnected to form two binary trees, each including fifteen cells, and the carry cells are chained in segments including up to four cells. In general, the interconnections between the block-propagate bit cells are derived from a graph which is optimized to meet the constraints of fast static complementary metal-oxide-semiconductor (CMOS) circuit design: low fan-out and small capacitance load on most signals. Sufficient gain stages are present in the binary trees to build-up to a large drive capability where the large drive capability is needed.
    • 携带前视加法器通过包括第一行传播生成位单元,在数据通路中获得具有最小栅极扇入和区域有效逻辑单元的规则阵列的高速度,第二行块传播位单元生成 块传播和块生成位的层次,进位位单元的第三行和和位位单元的底层。 第二行块传播位单元将块传播和块生成位提供给进位位单元的链接段中的第一进位位单元。 在用于32位互补金属氧化物半导体(CMOS)加法器的优选实施例中,逻辑门被限制为三个扇形,并且第二行中的块传播位单元被互连以形成两个二叉树 ,每个包括十五个小区,并且携带单元被链接到包括多达四个小区的段中。 通常,块传播比特单元之间的互连是从图形中导出的,该曲线被优化以满足快速静态互补金属氧化物半导体(CMOS)电路设计的约束:大多数信号的低扇出和小电容负载 。 在二叉树中存在足够的增益级,以构建需要大驱动能力的大型驱动能力。