会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Mixed-modulo address generation using shadow segment registers
    • 使用影子段寄存器的混合模地址生成
    • US5790443A
    • 1998-08-04
    • US618632
    • 1996-03-19
    • Gene ShenShalesh ThusooJames S. BlomgrenBetty Kikuta
    • Gene ShenShalesh ThusooJames S. BlomgrenBetty Kikuta
    • G06F7/50G06F7/509G06F9/30G06F9/355G06F12/02G06F7/38G06F9/26
    • G06F9/30116G06F12/0292G06F7/509G06F9/3013G06F9/355G06F9/3552G06F7/49931G06F7/49994
    • A mixed-modulo address generation unit has several inputs. The unit effectively adds together a subset of these inputs in a reduced modulus while simultaneously adding other inputs in a full modulus to the partial sum of reduced-modulus inputs. The subset of inputs receives reduced-width address components such as 16-bit address components which are effectively added together in modulo 64K. The other inputs receive full-width address components such as 32-bit components which are added in the full modulus, 4G. Reduced-width components are zero-extended to 32 bits before input to a standard 32-bit adder. A 16-bit carry generator also receives the reduced-width components and generates the carries out of the 16th bit position. When one or more carries is detected, a correction term is subtracted from the initial sum which is recirculated to the adder's input in a subsequent step. The correction term is the number of carries out of the 16th bit position multiplied by 64K. The full-width segment bases for all active segments are stored in the register file, but the most commonly accessed segments, the data and stack segments, have a copy of their segment bases also stored in a shadow register for input to the adder. Thus the number of read ports to the register file is reduced by the shadow segment register. Less-frequently-used segments require an additional step through the adder to generate the address, but addresses in the data and stack segments are generated in a single cycle.
    • 混合模地址生成单元具有多个输入。 该单元有效地将减少的模数中的这些输入的子集合在一起,同时将全模数的其他输入添加到减模量输入的部分和。 输入子集接收减少宽度的地址组件,例如16位地址组件,这些组件以64K模式实际加在一起。 其他输入接收全宽地址组件,例如以完全模数4G格式添加的32位组件。 在输入到标准32位加法器之前,缩减宽度分量将零扩展到32位。 16位进位发生器还接收减小宽度分量并产生第16位位置的执行。 当检测到一个或多个载波时,在随后的步骤中从初始和减去校正项,该初始和再循环到加法器的输入。 校正项是执行第16位位乘以64K的次数。 所有活动段的全宽段基准存储在寄存器文件中,但是最常访问的段,数据和堆栈段都具有其段基准的副本,也存储在阴影寄存器中,以输入加法器。 因此,通过影子段寄存器减少寄存器文件的读端口数。 较不频繁使用的分段需要通过加法器的附加步骤来生成地址,但是在单个周期中生成数据和堆栈段中的地址。
    • 2. 发明授权
    • Stack push/pop tracking and pairing in a pipelined processor
    • 在流水线处理器中堆栈推/弹跟踪和配对
    • US5687336A
    • 1997-11-11
    • US584836
    • 1996-01-11
    • Gene ShenShalesh ThusooJames S. Blomgren
    • Gene ShenShalesh ThusooJames S. Blomgren
    • G06F9/34G06F9/38G06F9/32
    • G06F9/3816G06F9/34G06F9/3824
    • A pipelined processor executes several stack instructions simultaneously. Additional shadow registers for stack pointers of instructions in the pipeline are not needed. Instead the new stack pointer is generated once at the end of the pipeline and written to the register file. The stack pointer is needed for generating the stack-top address in memory. The stack-top address is generated early in the pipeline. Other stack instructions in the pipeline which have not yet incremented the stack pointer are located with a stack valid bit array. The stack valid array indicates the increment or decrement amounts for stack instructions in each pipeline stage. An overall displacement or increment value is computed as the sum of all increments and decrements for stack instructions in the pipeline which have not yet updated the stack pointer. The overall displacement which accounts for all unfinished stack instructions is added to the stack pointer from the register file to generate the stack-top address. Thus the new stack pointer does not have to be generated before the stack memory is accessed. Pushes or pops are paired by doubling the increment amount in the stack valid bit array and performing a double-width data transfer.
    • 流水线处理器同时执行多个堆栈指令。 不需要额外的影子寄存器用于管道中指令的堆栈指针。 相反,新的堆栈指​​针在流水线的末尾生成一次并写入寄存器文件。 需要堆栈指针来生成内存中的堆栈顶部地址。 栈顶地址在管道中提前生成。 流水线中尚未增加堆栈指针的其他堆栈指令位于堆栈有效位数组中。 堆栈有效数组表示每个流水线阶段堆栈指令的增量或减量量。 总体位移或增量值被计算为流水线中尚未更新堆栈指针的堆栈指令的所有增量和减量的总和。 考虑到所有未完成的堆栈指令的总体位移从寄存器文件添加到堆栈指针以生成堆栈顶部地址。 因此,在访问堆栈内存之前,不必生成新的堆栈指​​针。 通过将堆栈有效位数组中的增量量加倍并执行双宽度数据传输,推送或弹出配对。
    • 3. 发明授权
    • Reduced register-dependency checking for paired-instruction dispatch in
a superscalar processor with partial register writes
    • 在具有部分寄存器写入的超标量处理器中减少了配对指令调度的寄存器依赖性检查
    • US5790826A
    • 1998-08-04
    • US618636
    • 1996-03-19
    • Shalesh ThusooGene ShenJames S. Blomgren
    • Shalesh ThusooGene ShenJames S. Blomgren
    • G06F9/30G06F9/312G06F9/38G06F9/28
    • G06F9/30043G06F9/30112G06F9/3824G06F9/3834G06F9/3836G06F9/3857
    • The dispatch unit of a superscalar processor checks for register dependencies among instructions to be issued together as a group. The first instruction's destination register is compared to the following instructions' sources, but the destinations of following instructions are not checked with the first instruction's destination. Instead, instructions with destination-destination dependencies are dispatched together as a group. These instructions flow down the pipelines. At the end of the pipelines the destinations are compared. If the destinations match then the results are merged together and written to the register. When instructions write to only a portion of the register, merging ensures that the correct portions of the register are written by the appropriate instructions in the group. Thus older code which performs partial-register writes can benefit from superscalar processing by dispatching the instructions together as a group and then merging the writes together at the end of the pipelines. The dispatch and decode stage, which is often a critical path on the processor, is reduced in complexity by not checking for destination-register dependencies. Performance increases because more kinds of instructions can be dispatched together in a group, increasing the use of the superscalar features.
    • 超标量处理器的调度单元检查要作为一组发放在一起的指令之间的寄存器依赖性。 第一个指令的目的地寄存器与以下指令的源进行比较,但是第一个指令的目的地不检查以下指令的目的地。 相反,具有目的地 - 目的地依赖关系的指令一起作为一组分派。 这些指令沿着管道流下。 在管道末端比较目的地。 如果目的地匹配,则结果合并在一起并写入寄存器。 当指令仅写入寄存器的一部分时,合并确保寄存器的正确部分由组中的相应指令写入。 因此,执行部分寄存器写入的较旧的代码可以通过将指令一起发送为一组然后在管道末端合并在一起的超标量处理而受益。 调度和解码阶段(通常是处理器上的关键路径)通过不检查目标寄存器依赖关系来降低复杂度。 性能提高,因为可以在一组中一起调度更多种类的指令,从而增加超标量特征的使用。
    • 4. 发明授权
    • Debug and video queue for multi-processor chip
    • 多处理器芯片的调试和视频队列
    • US5848264A
    • 1998-12-08
    • US740248
    • 1996-10-25
    • Brian R. BairdDavid E. RichterShalesh ThusooDavid M. StarkJames S. Blomgren
    • Brian R. BairdDavid E. RichterShalesh ThusooDavid M. StarkJames S. Blomgren
    • G06F11/36G06F9/455
    • G06F11/3636G06F11/3656
    • A microprocessor die contains several processor cores and a shared cache. Trigger conditions for one or more of the processor cores are programmed into debug registers. When a trigger is detected, a trace record is generated and loaded into a debug queue on the microprocessor die. Several trace records from different processor cores can be rapidly generated and loaded into the debug queue. The external interface cannot transfer these trace records to an external in-circuit emulator (ICE) at the rate generated. The debug queue transfers trace records to the external ICE using a dedicated bus to the ICE so that bandwidth is not taken from the memory bus. The memory bus is not slowed for debugging, providing a more realistic debugging session. The debug buffer is also used as a video FIFO for buffering pixels for display on a monitor. The dedicated bus is connected to an external DAC rather than to the external ICE when debugging is not being performed.
    • 微处理器芯片包含几个处理器内核和一个共享缓存。 一个或多个处理器内核的触发条件被编程到调试寄存器中。 当检测到触发时,生成跟踪记录并加载到微处理器管芯上的调试队列中。 可以快速生成来自不同处理器内核的多个跟踪记录,并将其加载到调试队列中。 外部接口不能以生成的速率将这些跟踪记录传输到外部在线仿真器(ICE)。 调试队列使用专用总线将ICE跟踪记录传输到外部ICE,从而不会从存储器总线获取带宽。 内存总线调试速度并不慢,提供了更实际的调试会话。 调试缓冲区还用作视频FIFO,用于缓冲显示器上的像素。 当不进行调试时,专用总线连接到外部DAC而不是外部ICE。
    • 5. 发明授权
    • Early instruction-length pre-decode of variable-length instructions in a
superscalar processor
    • 超标量处理器中可变长度指令的早期指令长度预解码
    • US5809272A
    • 1998-09-15
    • US564718
    • 1995-11-29
    • Shalesh ThusooJames S. Blomgren
    • Shalesh ThusooJames S. Blomgren
    • G06F9/30G06F9/38
    • G06F9/3816G06F9/30149G06F9/382G06F9/3853
    • A superscalar processor can dispatch two instructions per clock cycle. The first instruction is decoded from instruction bytes in a large instruction buffer. A secondary instruction buffer is loaded with a copy of the first few bytes of the second instruction to be dispatched in a cycle. In the previous cycle this secondary instruction buffer is used to determine the length of the second instruction dispatched in that previous cycle. That second instruction's length is then used to extract the first bytes of the third instruction, and its length is also determined. The first bytes of the fourth instruction are then located. When both the first and the second instructions are dispatched, the secondary buffer is loaded with the bytes from the fourth instruction. If only the first instruction is dispatched, then the secondary buffer is loaded with the first bytes of the third instruction. Thus the secondary buffer is always loaded with the starting bytes of undispatched instructions. The starting bytes are found in the previous cycle. Once initialized, two instructions can be issued each cycle. Decoding of both the first and second instructions proceeds without delay since the starting bytes of the second instruction are found in the previous cycle. On the initial cycle after a reset or branch mis-predict, just the first instruction can be issued. The secondary buffer is initially loaded with a copy of the first instruction's starting bytes, allowing the two length decoders to be used to generate the lengths of the first and second instructions or the second and third instructions. Only two, and not three, length decoders are needed.
    • 超标量处理器可以在每个时钟周期分配两个指令。 第一条指令从大指令缓冲区中的指令字节中解码。 辅助指令缓冲器装载要在一个周期中调度的第二条指令的前几个字节的副本。 在上一个循环中,该辅助指令缓冲区用于确定在上一个循环中分派的第二个指令的长度。 然后第二条指令的长度用于提取第三条指令的第一个字节,并确定其长度。 然后定位第四条指令的第一个字节。 当第一个和第二个指令都被分派时,辅助缓冲区被加载来自第四条指令的字节。 如果只分派第一个指令,则辅助缓冲区被加载第三个指令的第一个字节。 因此,辅助缓冲区总是加载未分配指令的起始字节。 起始字节在上一个循环中找到。 一旦初始化,每个循环可以发出两个指令。 第一和第二指令的解码都不会延迟进行,因为第二个指令的起始字节是在前一个周期中找到的。 在复位或分支误预测后的初始周期上,只能发出第一条指令。 辅助缓冲器最初装载有第一指令的起始字节的副本,允许使用两个长度解码器来生成第一和第二指令或第二和第三指令的长度。 只需要两个,而不是三个长度解码器。
    • 7. 发明授权
    • Method and apparatus for pre-branch instruction
    • 分支前指导的方法和装置
    • US06622240B1
    • 2003-09-16
    • US09496008
    • 2000-02-01
    • Timothy Alan OlsonJames S. Blomgren
    • Timothy Alan OlsonJames S. Blomgren
    • G06F938
    • G06F9/3867G06F9/30058G06F9/3804G06F9/3842G06F9/3846
    • A method and apparatus that minimizes instruction gaps behind a branch instruction in a multistage pipelined processor is disclosed. A pre-branch instruction that corresponds to a branch instruction to inserted into the instruction stream a sufficient number of instructions ahead of the branch instruction to insure that the pre-branch instruction exits the decode stage of the pipeline at the same time the branch instruction exits the first instruction fetch stage of the pipeline. The pre-branch instruction is decoded and causes the instruction fetch unit either to begin fetching instructions at a target address, where the branch is known or predicted to be taken, or to continue fetching instructions along the main execution path, the branch is known or predicted to be not taken.
    • 公开了一种使多级流水线处理器中的分支指令之后的指令间隙最小化的方法和装置。 一个预分支指令,其对应于在分支指令之前插入到指令流中的足够数量的指令的分支指令,以确保分支指令在分支指令退出的同时离开流水线的解码级 管道的第一个指令提取阶段。 预分支指令被解码,并且使指令获取单元开始在目标地址处获取指令,其中分支已知或预测将被采用,或者继续沿着主执行路径获取指令,分支是已知的或 预计不会被采取。
    • 8. 发明授权
    • Multiple-state simulation for non-binary logic
    • 非二进制逻辑的多状态仿真
    • US06604065B1
    • 2003-08-05
    • US09405474
    • 1999-09-24
    • James S. BlomgrenFritz A. Boehm
    • James S. BlomgrenFritz A. Boehm
    • G06F1750
    • G06F17/5022
    • A method of efficiently simulating logic designs comprising signals that are capable of having more than two unique decimal values and one or more unique drive states, such as designs based upon the new N-nary logic design style, is disclosed. The present invention includes a signal model that models N-nary signal value, drive strength, and signal definition information in a specific format that supports the ability of the simulator to simulate the operation of the N-nary logic gates such as adders, buffers, and multiplexers by arithmetically and logically manipulating the unique decimal values of the N-nary signals. The simulator comprises an input logic signal model reader, an arithmetic/logical operator, an output logic signal model generator, and an output message generator that generates one or more output- or input-signal-specific output messages that pack relevant simulation data into a format optimized to the architecture of the simulation host.
    • 公开了一种有效地模拟逻辑设计的方法,其包括能够具有多于两个唯一的十进制值和一个或多个唯一的驱动状态的信号,例如基于新的N-nary逻辑设计风格的设计。 本发明包括以特定格式对N信号值,驱动强度和信号定义信息进行建模的信号模型,其支持模拟器模拟N-Nary逻辑门的操作的能力,例如加法器,缓冲器, 和多路复用器通过算术和逻辑地操纵N-Nary信号的唯一十进制值。 模拟器包括输入逻辑信号模型读取器,算术/逻辑运算符,输出逻辑信号模型生成器和输出消息生成器,其生成一个或多个输出或输入信号特定的输出消息,将相关模拟数据打包到 格式优化到仿真主机的架构。
    • 10. 发明授权
    • Rounding anticipator for floating point operations
    • 舍入预测浮点运算
    • US06557021B1
    • 2003-04-29
    • US09527653
    • 2000-03-17
    • Jeffrey S. BrooksJames S. Blomgren
    • Jeffrey S. BrooksJames S. Blomgren
    • G06F738
    • G06F7/49947G06F7/483G06F7/49936G06F7/5443
    • A method and apparatus that performs anticipatory rounding of intermediate results in a floating point arithmetic system while the intermediate results are being normalized is disclosed. One embodiment of the present invention includes four logic levels, implemented in N-NARY logic. In the first three logic levels, propagation information is gathered for preselected bit groups from the coarse and medium shift output of the normalizer as those results become available. In the fourth level, an incremented, normalized intermediate single-precision or double-precision mantissa result is produced by combining fine shift output bit values with propagation information for the appropriate top bit group, middle bit group, and bottom bit group. The appropriate bit groups are determined by examining the value of the fine shift select signal.
    • 公开了一种在中间结果正规化的同时在浮点算术系统中执行中间结果的预期舍入的方法和装置。 本发明的一个实施例包括在N-NARY逻辑中实现的四个逻辑电平。 在前三个逻辑电平中,当这些结果变得可用时,从归一化器的粗中移位输出收集预选位组的传播信息。 在第四级中,通过将精细位移输出位值与适当的顶部位组,中间位组和底部位组的传播信息组合,产生递增的标准化中间单精度或双精度尾数结果。 通过检查精细位移选择信号的值来确定适当的位组。