会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 7. 发明授权
    • Specifying different type generalized event and action pair in a processor
    • 在处理器中指定不同类型的广义事件和动作对
    • US06735690B1
    • 2004-05-11
    • US09598566
    • 2000-06-21
    • Edwin F. BarryPatrick R. MarchandGerald G. PechanekCharles W. Kurak, Jr.
    • Edwin F. BarryPatrick R. MarchandGerald G. PechanekCharles W. Kurak, Jr.
    • G06F1500
    • G06F9/30054G06F9/30101G06F9/30112G06F9/325
    • A processor with a generalized eventpoint architecture, which is scalable for use in a very long instruction word (VLIW) array processor, such as the manifold array (ManArray) processor is described. In one aspect, generalized processor event (p-event) detection facilities are provided by use of compares to check if an instruction address, a data memory address, an instruction, a data value, arithmetic-condition flags, or other processor change of state eventpoint has occurred. In another aspect, generalized processor action (p-action) facilities are provided to cause a change in the program flow by loading the program counter with a new instruction address, generate an interrupt, signal a semaphore, log or count the p-event, time stamp the event, initiate a background operation, or to cause other p-actions to occur. The generalized facilities are defined in the eventpoint architecture as consisting of a control register and three eventpoint parameters, namely at least one register to compare against, a register containing a second compare register, a vector address, or parameter to be passed, and a count or mask register. Based upon this generalized eventpoint architecture, new capabilities are enabled. For example, auto-looping with capabilities to branch out of a nested auto-loop upon detection of a specified condition, background DMA facilities, the ability to link a chain of p-events together for debug purposes, and others are all important capabilities which are readily obtained.
    • 描述了具有广泛事件点架构的处理器,其可扩展以用于非常长的指令字(VLIW)阵列处理器,例如歧管阵列(ManArray)处理器。 在一个方面,通过使用比较来提供广义处理器事件(p事件)检测设施,以检查指令地址,数据存储器地址,指令,数据值,算术条件标志或其他处理器状态变化 事件点已发生。 在另一方面,提供通用处理器动作(p-action)功能以通过用新的指令地址加载程序计数器来产生程序流程的改变,生成中断,信号信号,记录或计数p事件, 事件时间戳,启动后台操作,或导致其他动作发生。 广义设施在事件点架构中被定义为由控制寄存器和三个事件点参数组成,即至少要有一个要比较的寄存器,一个包含第二个比较寄存器的寄存器,一个向量地址或要传递的参数,以及一个计数 或屏蔽寄存器。 基于这种广义的事件点架构,启用了新的功能。 例如,在检测到指定的条件时,自动循环具有分支出嵌套自动循环的功能,后台DMA设施,将p个事件链链接在一起用于调试目的的能力等等都是重要的功能 容易获得。
    • 8. 发明授权
    • Accessing tables in memory banks using load and store address generators sharing store read port of compute register file separated from address register file
    • 使用加载和存储地址生成器访问存储库中的表,共享存储与地址寄存器文件分离的计算寄存器文件的读取端口
    • US06397324B1
    • 2002-05-28
    • US09596103
    • 2000-06-16
    • Edwin Frank BarryCharles W. Kurak, Jr.Gerald G. PechanekLarry D. Larsen
    • Edwin Frank BarryCharles W. Kurak, Jr.Gerald G. PechanekLarry D. Larsen
    • G06F9312
    • G06F9/3004G06F9/30112G06F9/3012G06F9/3013G06F9/3885H03M7/425
    • A very long instruction word (VLIW) processor typically requires a large number of register file ports due to the parallel execution of the sub-instructions comprising the VLIW. By splitting a general purpose register file into separate address and compute register files, the number of compute register file ports is significantly reduced. This reduction is particularly evident when multiple load and store execution units with indexed addressing modes are supported. The implication is that a faster register file and dedicated address registers are achieved in the programming model. The savings comes at the cost of providing support for data movement between the compute register file and the address register file. In addition, address arithmetic, table look-up, and store to table functions are desirable functions that cannot be obviously obtained when the address registers are separated from the compute registers. The present approach provides an efficient mechanism for supporting these functions while maintaining separate compute and address register files.
    • 由于并行执行包括VLIW的子指令,很长的指令字(VLIW)处理器通常需要大量的寄存器文件端口。 通过将通用寄存器文件分割成单独的地址和计算寄存器文件,计算寄存器文件端口的数量大大减少。 当支持具有索引寻址模式的多个加载和存储执行单元时,这种减少尤其明显。 这意味着在编程模型中实现了更快的寄存器文件和专用地址寄存器。 节省成本是为计算寄存器文件和地址寄存器文件之间的数据移动提供支持。 此外,地址算术,表查找和存储到表函数是当地址寄存器与计算寄存器分离时不能明显获得的所需函数。 本方法提供了一种支持这些功能的有效机制,同时保持单独的计算和地址寄存器文件。
    • 9. 发明授权
    • Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
    • 在具有子字执行的基于VLIW的阵列处理器中支持条件执行的方法和装置
    • US06366999B1
    • 2002-04-02
    • US09238446
    • 1999-01-28
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • G06F1580
    • G06F9/30094G06F9/30036G06F9/30072G06F9/30181G06F9/3842G06F9/3885G06F9/3887G06F9/3891G06F15/8007
    • General purpose flags (ACFs) are defined and encoded utilizing a hierarchical one-, two- or three-bit encoding. Each added bit provides a superset of the previous functionality. With condition combination, a sequential series of conditional branches based on complex conditions may be avoided and complex conditions can then be used for conditional execution. ACF generation and use can be specified by the programmer. By varying the number of flags affected, conditional operation parallelism can be widely varied, for example, from mono-processing to octal-processing in VLIW execution, and across an array of processing elements (PE)s. Multiple PEs can generate condition information at the same time with the programmer being able to specify a conditional execution in one processor based upon a condition generated in a different processor using the communications interface between the processing elements to transfer the conditions. Each processor in a multiple processor array may independently have different units conditionally operate based upon their ACFs.
    • 使用分层一位,二位或三位编码来定义和编码通用标志(ACF)。 每个添加的位提供了先前功能的超集。 通过条件组合,可以避免基于复杂条件的顺序一系列条件分支,然后可以将复杂条件用于条件执行。 ACF生成和使用可以由程序员指定。 通过改变受影响的标志的数量,条件操作并行性可以被广泛地变化,例如,从VLIW执行中的单处理到八进制处理,以及处理元件(PE)的阵列。 多个PE可以同时生成条件信息,程序员能够基于使用处理元件之间的通信接口在不同的处理器中生成的条件来指定一个处理器中的条件执行以传送条件。 多处理器阵列中的每个处理器可以独立地具有基于它们的ACF有条件地操作的不同单元。
    • 10. 发明授权
    • Methods and apparatus for efficient cosine transform implementations
    • 用于有效余弦变换实现的方法和装置
    • US06754687B1
    • 2004-06-22
    • US09711218
    • 2000-11-09
    • Charles W. Kurak, Jr.Gerald G. Pechanek
    • Charles W. Kurak, Jr.Gerald G. Pechanek
    • G06F1714
    • G06F9/30014G06F9/30032G06F9/30036G06F9/3885G06F17/147
    • Many video processing applications, such as the decoding and encoding standards promulgated by the moving picture experts group (MPEG), are time constrained applications with multiple complex compute intensive algorithms such as the two-dimensional 8×8 IDCT. In addition, for encoding applications, cost, performance, and programming flexibility for algorithm optimizations are important design requirements. Consequently, it is of great advantage to meeting performance requirements to have a programmable processor that can achieve extremely high performance on the 2D 8×8 IDCT function. The ManArray 2×2 processor is able to process the 2D 8×8 IDCT in 34-cycles and meet the IEEE standard 1180-1990 for precision of the IDCT. A unique distributed 2D 8×8 IDCT process is presented along with the unique data placement supporting the high performance algorithm. In addition, a scalable 2D 8×8 IDCT algorithm that is operable on a 1×0, 1×1, 1×2, 2×2, 2×3, and further arrays of greater numbers of processors is presented that minimizes the VIM memory size by reuse of VLIWs and streamlines further application processing by having the IDCT results output in a standard row-major order. The techniques are applicable to cosine transforms more generally, such as discrete cosine transforms (DCTs).
    • 诸如运动图像专家组(MPEG)所公布的解码和编码标准的许多视频处理应用是具有诸如二维8×8 IDCT的复杂计算密集型算法的时间约束应用。 此外,对于编码应用,算法优化的成本,性能和编程灵活性是重要的设计要求。 因此,满足性能要求具有可在2D 8x8 IDCT功能上实现极高性能的可编程处理器是非常有利的。 ManArray 2x2处理器能够以34个周期处理2D 8x8 IDCT,并符合IEEE标准1180-1990的IDCT精度。 提供独特的分布式2D 8x8 IDCT过程以及支持高性能算法的独特数据布局。 此外,还提出了一种可扩展的2D 8x8 IDCT算法,可在1x0,1x1,1x2,2x2,2x3以及更多数量处理器的其他阵列上工作,可通过重用VLIW来最小化VIM存储器大小,并通过以下方式简化进一步的应用处理 将IDCT结果输出为标准行主要顺序。 这些技术更适用于更一般的余弦变换,例如离散余弦变换(DCT)。