会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
    • 在具有子字执行的基于VLIW的阵列处理器中支持条件执行的方法和装置
    • US06366999B1
    • 2002-04-02
    • US09238446
    • 1999-01-28
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • G06F1580
    • G06F9/30094G06F9/30036G06F9/30072G06F9/30181G06F9/3842G06F9/3885G06F9/3887G06F9/3891G06F15/8007
    • General purpose flags (ACFs) are defined and encoded utilizing a hierarchical one-, two- or three-bit encoding. Each added bit provides a superset of the previous functionality. With condition combination, a sequential series of conditional branches based on complex conditions may be avoided and complex conditions can then be used for conditional execution. ACF generation and use can be specified by the programmer. By varying the number of flags affected, conditional operation parallelism can be widely varied, for example, from mono-processing to octal-processing in VLIW execution, and across an array of processing elements (PE)s. Multiple PEs can generate condition information at the same time with the programmer being able to specify a conditional execution in one processor based upon a condition generated in a different processor using the communications interface between the processing elements to transfer the conditions. Each processor in a multiple processor array may independently have different units conditionally operate based upon their ACFs.
    • 使用分层一位,二位或三位编码来定义和编码通用标志(ACF)。 每个添加的位提供了先前功能的超集。 通过条件组合,可以避免基于复杂条件的顺序一系列条件分支,然后可以将复杂条件用于条件执行。 ACF生成和使用可以由程序员指定。 通过改变受影响的标志的数量,条件操作并行性可以被广泛地变化,例如,从VLIW执行中的单处理到八进制处理,以及处理元件(PE)的阵列。 多个PE可以同时生成条件信息,程序员能够基于使用处理元件之间的通信接口在不同的处理器中生成的条件来指定一个处理器中的条件执行以传送条件。 多处理器阵列中的每个处理器可以独立地具有基于它们的ACF有条件地操作的不同单元。
    • 2. 发明授权
    • Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
    • 在具有子字执行的基于VLIW的阵列处理器中支持条件执行的方法和装置
    • US06760831B2
    • 2004-07-06
    • US10114652
    • 2002-04-01
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • Thomas L. DrabenstottGerald G. PechanekEdwin F. BarryCharles W. Kurak, Jr.
    • G06F1580
    • G06F9/30094G06F9/30036G06F9/30072G06F9/30181G06F9/3842G06F9/3885G06F9/3887G06F9/3891G06F15/8007
    • General purpose flags (ACFs) are defined and encoded utilizing a hierarchical one-, two- or three-bit encoding. Each added bit provides a superset of the previous functionality. With condition combination, a sequential series of conditional branches based on complex conditions may be avoided and complex conditions can then be used for conditional execution. ACF generation and use can be specified by the programmer. By varying the number of flags affected, conditional operation parallelism can be widely varied, for example, from mono-processing to octal-processing in VLIW execution, and across an array of processing elements (PE)s. Multiple PEs can generate condition information at the same time with the programmer being able to specify a conditional execution in one processor based upon a condition generated in a different processor using the communications interface between the processing elements to transfer the conditions. Each processor in a multiple processor array may independently have different units conditionally operate based upon their ACFs.
    • 使用分层一位,二位或三位编码来定义和编码通用标志(ACF)。 每个添加的位提供了先前功能的超集。 通过条件组合,可以避免基于复杂条件的顺序一系列条件分支,然后可以将复杂条件用于条件执行。 ACF生成和使用可以由程序员指定。 通过改变受影响的标志的数量,条件操作并行性可以被广泛地变化,例如,从VLIW执行中的单处理到八进制处理,以及处理元件(PE)的阵列。 多个PE可以同时生成条件信息,程序员能够基于使用处理元件之间的通信接口在不同的处理器中生成的条件来指定一个处理器中的条件执行以传送条件。 多处理器阵列中的每个处理器可以独立地具有基于它们的ACF有条件地操作的不同单元。
    • 5. 发明授权
    • Specifying different type generalized event and action pair in a processor
    • 在处理器中指定不同类型的广义事件和动作对
    • US06735690B1
    • 2004-05-11
    • US09598566
    • 2000-06-21
    • Edwin F. BarryPatrick R. MarchandGerald G. PechanekCharles W. Kurak, Jr.
    • Edwin F. BarryPatrick R. MarchandGerald G. PechanekCharles W. Kurak, Jr.
    • G06F1500
    • G06F9/30054G06F9/30101G06F9/30112G06F9/325
    • A processor with a generalized eventpoint architecture, which is scalable for use in a very long instruction word (VLIW) array processor, such as the manifold array (ManArray) processor is described. In one aspect, generalized processor event (p-event) detection facilities are provided by use of compares to check if an instruction address, a data memory address, an instruction, a data value, arithmetic-condition flags, or other processor change of state eventpoint has occurred. In another aspect, generalized processor action (p-action) facilities are provided to cause a change in the program flow by loading the program counter with a new instruction address, generate an interrupt, signal a semaphore, log or count the p-event, time stamp the event, initiate a background operation, or to cause other p-actions to occur. The generalized facilities are defined in the eventpoint architecture as consisting of a control register and three eventpoint parameters, namely at least one register to compare against, a register containing a second compare register, a vector address, or parameter to be passed, and a count or mask register. Based upon this generalized eventpoint architecture, new capabilities are enabled. For example, auto-looping with capabilities to branch out of a nested auto-loop upon detection of a specified condition, background DMA facilities, the ability to link a chain of p-events together for debug purposes, and others are all important capabilities which are readily obtained.
    • 描述了具有广泛事件点架构的处理器,其可扩展以用于非常长的指令字(VLIW)阵列处理器,例如歧管阵列(ManArray)处理器。 在一个方面,通过使用比较来提供广义处理器事件(p事件)检测设施,以检查指令地址,数据存储器地址,指令,数据值,算术条件标志或其他处理器状态变化 事件点已发生。 在另一方面,提供通用处理器动作(p-action)功能以通过用新的指令地址加载程序计数器来产生程序流程的改变,生成中断,信号信号,记录或计数p事件, 事件时间戳,启动后台操作,或导致其他动作发生。 广义设施在事件点架构中被定义为由控制寄存器和三个事件点参数组成,即至少要有一个要比较的寄存器,一个包含第二个比较寄存器的寄存器,一个向量地址或要传递的参数,以及一个计数 或屏蔽寄存器。 基于这种广义的事件点架构,启用了新的功能。 例如,在检测到指定的条件时,自动循环具有分支出嵌套自动循环的功能,后台DMA设施,将p个事件链链接在一起用于调试目的的能力等等都是重要的功能 容易获得。
    • 6. 发明授权
    • Methods and apparatus for efficient cosine transform implementations
    • 用于有效余弦变换实现的方法和装置
    • US06754687B1
    • 2004-06-22
    • US09711218
    • 2000-11-09
    • Charles W. Kurak, Jr.Gerald G. Pechanek
    • Charles W. Kurak, Jr.Gerald G. Pechanek
    • G06F1714
    • G06F9/30014G06F9/30032G06F9/30036G06F9/3885G06F17/147
    • Many video processing applications, such as the decoding and encoding standards promulgated by the moving picture experts group (MPEG), are time constrained applications with multiple complex compute intensive algorithms such as the two-dimensional 8×8 IDCT. In addition, for encoding applications, cost, performance, and programming flexibility for algorithm optimizations are important design requirements. Consequently, it is of great advantage to meeting performance requirements to have a programmable processor that can achieve extremely high performance on the 2D 8×8 IDCT function. The ManArray 2×2 processor is able to process the 2D 8×8 IDCT in 34-cycles and meet the IEEE standard 1180-1990 for precision of the IDCT. A unique distributed 2D 8×8 IDCT process is presented along with the unique data placement supporting the high performance algorithm. In addition, a scalable 2D 8×8 IDCT algorithm that is operable on a 1×0, 1×1, 1×2, 2×2, 2×3, and further arrays of greater numbers of processors is presented that minimizes the VIM memory size by reuse of VLIWs and streamlines further application processing by having the IDCT results output in a standard row-major order. The techniques are applicable to cosine transforms more generally, such as discrete cosine transforms (DCTs).
    • 诸如运动图像专家组(MPEG)所公布的解码和编码标准的许多视频处理应用是具有诸如二维8×8 IDCT的复杂计算密集型算法的时间约束应用。 此外,对于编码应用,算法优化的成本,性能和编程灵活性是重要的设计要求。 因此,满足性能要求具有可在2D 8x8 IDCT功能上实现极高性能的可编程处理器是非常有利的。 ManArray 2x2处理器能够以34个周期处理2D 8x8 IDCT,并符合IEEE标准1180-1990的IDCT精度。 提供独特的分布式2D 8x8 IDCT过程以及支持高性能算法的独特数据布局。 此外,还提出了一种可扩展的2D 8x8 IDCT算法,可在1x0,1x1,1x2,2x2,2x3以及更多数量处理器的其他阵列上工作,可通过重用VLIW来最小化VIM存储器大小,并通过以下方式简化进一步的应用处理 将IDCT结果输出为标准行主要顺序。 这些技术更适用于更一般的余弦变换,例如离散余弦变换(DCT)。
    • 8. 发明授权
    • Manifold array processor
    • 歧管阵列处理器
    • US06338129B1
    • 2002-01-08
    • US09323609
    • 1999-06-01
    • Gerald G. PechanekCharles W. Kurak, Jr.
    • Gerald G. PechanekCharles W. Kurak, Jr.
    • G06F1516
    • G06F15/17381G06F9/30076G06F15/17337G06F15/8023
    • An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays. Rather, the longest communications path is limited only by the inter-cluster spacing. In one implementation, transpose elements of an N×N torus are combined in clusters and communicate with one another through intra-cluster communications paths. Since transpose elements have direct connections to one another, transpose operation latency is eliminated in this approach. Additionally, each PE may have a single transmit port and a single receive port. As a result, the individual PEs are decoupled from the topology of the array.
    • 阵列处理器包括按簇排列的处理元件,它们依次以矩形阵列组合。 每个簇由优选地与至少两个其他簇的处理元件通信的处理元件形成。 另外每个集群间的通信路径是相互排斥的,也就是说,每条路径都有北西,南,东,北,东,或南,西通信。 由于数据路径的相互独占性,每个集群的处理元件之间的通信可以组合在单个集群间路径中。 也就是说,来自与北部和东部与另一个群集通信的群集的通信可以组合在一个路径中,从而消除路径所需的一半布线。 此外,最长通信路径的长度不是直接由阵列的整体尺寸决定,就像在传统的环面阵列中一样。 相反,最长的通信路径仅受群间间隔限制。 在一个实现中,将NxN环面的转置元素组合在一起并通过集群内通信路径相互通信。 由于转置元素具有彼此的直接连接,因此在此方法中消除了转置操作延迟。 另外,每个PE可以具有单个发送端口和单个接收端口。 因此,各个PE与阵列的拓扑结构分离。
    • 9. 发明授权
    • Manifold array processor
    • US6023753A
    • 2000-02-08
    • US885310
    • 1997-06-30
    • Gerald G. PechanekCharles W. Kurak, Jr.
    • Gerald G. PechanekCharles W. Kurak, Jr.
    • G06F15/173G06F15/80G06F15/00
    • G06F15/17381G06F15/17337G06F15/8023G06F9/30076
    • An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays. Rather, the longest communications path is limited only by the inter-cluster spacing. In one implementation, transpose elements of an N.times.N torus are combined in clusters and communicate with one another through intra-cluster communications paths. Since transpose elements have direct connections to one another, transpose operation latency is eliminated in this approach. Additionally, each PE may have a single transmit port and a single receive port. As a result, the individual PEs are decoupled from the topology of the array.
    • 10. 发明授权
    • Methods and apparatus for abbreviated instruction sets adaptable to configurable processor architecture
    • 缩写指令集的方法和装置适应于可配置的处理器架构
    • US06408382B1
    • 2002-06-18
    • US09422015
    • 1999-10-21
    • Gerald G. PechanekCharles W. Kurak, Jr.Larry D. Larsen
    • Gerald G. PechanekCharles W. Kurak, Jr.Larry D. Larsen
    • G06F945
    • G06F9/30178G06F8/4434G06F9/30156Y10S707/99935
    • An improved manifold array (ManArray) architecture addresses the problem of configurable application-specific instruction set optimization and instruction memory reduction using an instruction abbreviation process thereby further optimizing the general ManArray architecture for application to high-volume and portable battery-powered type of products. In the ManArray abbreviation process a standard 32-bit ManArray instruction is reduced to a smaller length instruction format, such as 14-bits. An application is first programmed using the full ManArray instruction set using the native 32-bit instructions. After the application program is completed and verified, an instruction-abbreviation tool analyzes the 32-bit application program and generates the abbreviated program using the abbreviated instructions. This instruction abbreviation process allows different program-reduction optimizations tailored for each application program. This process develops an optimized instruction set for the intended application. The abbreviated program, now located in a significantly smaller instruction memory, is functionally equivalent to the original native 32-bit application program. The abbreviated-instructions are fetched from this smaller memory and then dynamically translated into native ManArray instruction form in a sequence processor controller. Since the instruction set is now determined for the specific application. an optimized processor design can be easily produced. The system and process can be applied to native instructions having other numbers of bits and to other processing architectures.
    • 改进的歧管阵列(ManArray)架构使用指令缩写过程解决了可配置的应用特定指令集优化和指令存储器减少的问题,从而进一步优化了一般的ManArray架构,以应用于大容量和便携式电池供电类型的产品。 在ManArray缩写过程中,标准的32位ManArray指令被缩减为较小长度的指令格式,例如14位。 应用程序首先使用本机32位指令使用完整的ManArray指令集进行编程。 应用程序完成和验证后,一个指令缩写工具分析32位应用程序,并使用缩写说明生成缩写程序。 该指令缩写过程允许针对每个应用程序量身定制的不同的程序减少优化。 该过程为预期应用开发了优化的指令集。 缩写程序现在位于显着较小的指令存储器中,在功能上等同于原始的本机32位应用程序。 缩写指令从该较小的存储器中获取,然后在序列处理器控制器中动态地转换为本地ManArray指令形式。 由于现在针对具体应用确定了指令集。 可以轻松制作优化的处理器设计。 系统和过程可以应用于具有其他位数的本机指令和其他处理架构。