专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

21. 发明申请

US20090150874A1 BINARY PROGRAMMABLE METHOD FOR APPLICATION PERFORMANCE DATA COLLECTION 有权
标题翻译：应用性能数据收集的二进制可编程方法
公开(公告)号：US20090150874A1
公开(公告)日：2009-06-11
申请号：US11952922
申请日：2007-12-07
申请人： I-Hsin Chung , Kattamuri Ekanadham , David Joseph Klepacki , Simone Sbaraglia , Robert Edward Walkup , Hui-Fang Wen , Hao Yu
发明人： I-Hsin Chung , Kattamuri Ekanadham , David Joseph Klepacki , Simone Sbaraglia , Robert Edward Walkup , Hui-Fang Wen , Hao Yu
IPC分类号： G06F9/45 , G06F9/44
CPC分类号： G06F11/3466 , G06F8/4441 , G06F2201/865
摘要： A method for application performance data collection includes steps or acts of: customizing a performance tool for collecting application performance data of an application; modifying the application by inserting the performance tool while the application does not need to be rebuilt from the source; executing the application; and collecting the application execution performance data such that only interesting data is collected. Customizing the performance tool proceeds by implementing at least one configurable tracing function that can be programmed by the user; compiling the function(s) into an object file; and inserting the object file into the performance tool using binary instrumentation.
摘要翻译：应用性能数据收集的方法包括以下步骤或动作：定制用于收集应用的应用性能数据的性能工具; 通过插入性能工具来修改应用程序，同时不需要从源重建应用程序; 执行应用程序; 并收集应用执行性能数据，使得只收集有趣的数据。通过实现可由用户编程的至少一个可配置跟踪功能来自定义性能工具; 将函数编译成对象文件; 并使用二进制检测将目标文件插入性能工具。

22. 发明授权

US06978360B2 Scalable processor 失效
标题翻译：可扩展处理器
公开(公告)号：US06978360B2
公开(公告)日：2005-12-20
申请号：US09854243
申请日：2001-05-11
申请人： Gianfranco Bilardi , Kattamuri Ekanadham , Pratap Chandra Pattnaik
发明人： Gianfranco Bilardi , Kattamuri Ekanadham , Pratap Chandra Pattnaik
IPC分类号： G06F9/30 , G06F9/312 , G06F9/34 , G06F9/38
CPC分类号： G06F9/30043 , G06F9/383 , G06F9/3834 , G06F9/3885
摘要： A method and apparatus for issuing and executing memory instructions from a computer system so as to (1) maximize the number of requests issued to a highly pipe-lined memory, the only limitation being data dependencies in the program and (2) avoid reading data from memory before a corresponding write to memory. The memory instructions are organized to read and write into memory, by using explicit move instructions, thereby avoiding any data storage limitations in the processor. The memory requests are organized to carry complete information, so that they can be processed independently when memory returns the requested data. The memory is divided into a number of regions, each of which is associated with a fence counter. The fence counter for a memory region is incremented each time a memory instruction that is targeted to the memory region is issued and decremented each time there is a write to the memory region. After a fence instruction is issued, no further memory instructions are issued if the counter for the memory region specified in the fence instruction is above a threshold. When a sufficient number of the outstanding issued instructions are executed, the counter will be decremented below the threshold and further memory instructions are then issued.
摘要翻译：一种用于从计算机系统发出和执行存储器指令的方法和装置，以便（1）最大限度地发送给高度管道内存的请求数量，唯一的限制是程序中的数据依赖性，以及（2）避免读取数据从内存之前对相应的写入内存。存储器指令通过使用显式移动指令被组织以读取和写入存储器，从而避免了处理器中的任何数据存储限制。存储器请求被组织以携带完整的信息，使得当存储器返回所请求的数据时它们可以被独立地处理。存储器被分成多个区域，每个区域与栅栏计数器相关联。每当存储区域的存储器指令被发出并且每次对存储器区域进行写操作时，存储器区域的栅栏计数器递增。发出栅栏指令后，如果栅栏指令中指定的存储器区域的计数器高于阈值，则不会再发出存储指令。当执行足够数量的未完成的发出的指令时，计数器将递减到阈值以下，然后再发出存储器指令。

23. 发明授权

US5802582A Explicit coherence using split-phase controls 失效
标题翻译：使用分相控制的显式一致性
公开(公告)号：US5802582A
公开(公告)日：1998-09-01
申请号：US711750
申请日：1996-09-10
申请人： Kattamuri Ekanadham , Beng-Hong Lim , Pratap Chandra Pattnaik
发明人： Kattamuri Ekanadham , Beng-Hong Lim , Pratap Chandra Pattnaik
IPC分类号： G06F9/46 , G06F12/08 , G06F15/177 , G06F12/14
CPC分类号： G06F9/52 , G06F12/0815
摘要： A method and apparatus for maintaining cache coherence in a shared memory multiprocessor system, where cache coherence is preserved between lock acquires and releases rather than at every single memory load and store. With this invention, a Global Lock Manager (GLM) keeps track of the status of locked ranges without the need to maintain a list of individual processors in the system. Further a Recently Acquired Lock Manager (RALM) keeps track of the status of locked ranges within a processing node to reduce the need to communicate with a GLM.
摘要翻译：一种用于在共享存储器多处理器系统中维持高速缓存一致性的方法和装置，其中在锁获取和释放之间保存高速缓存一致性，而不是在每个单个存储器加载和存储期间。利用本发明，全球锁管理器（GLM）跟踪锁定范围的状态，而不需要维护系统中的各个处理器的列表。另外，最近获得的锁管理器（RALM）跟踪处理节点内锁定范围的状态，以减少与GLM进行通信的需要。

24. 发明授权

US5802288A Integrated communications for pipelined computers 失效
标题翻译：流水线计算机集成通信
公开(公告)号：US5802288A
公开(公告)日：1998-09-01
申请号：US589076
申请日：1996-01-23
申请人： Kattamuri Ekanadham , Ronald Mraz
发明人： Kattamuri Ekanadham , Ronald Mraz
IPC分类号： G06F13/38 , G06F15/173
CPC分类号： G06F13/387
摘要： This document describes a feature that can be added to existing pipelined architectures (such as RISC) to enhance packet based or message passing communications. The feature integrates the communication interface directly into the pipeline of the processor, offering the potential to greatly reduce latency and overhead for fine grain communications. Additionally, a second interface is provided to maintain high bandwidth for large blocks of data.
摘要翻译：本文档描述了可以添加到现有流水线架构（例如RISC）中以增强基于分组或消息传递通信的特征。该功能将通信接口直接集成到处理器的流水线中，提供了大幅度减少细粒度通信的延迟和开销的潜力。此外，提供第二接口以维持大数据块的高带宽。

25. 发明授权

US08683175B2 Seamless interface for multi-threaded core accelerators 有权
标题翻译：多线程核心加速器的无缝界面
公开(公告)号：US08683175B2
公开(公告)日：2014-03-25
申请号：US13048214
申请日：2011-03-15
申请人： Kattamuri Ekanadham , Hung Q. Le , Jose E. Moreira , Pratap C. Pattnaik
发明人： Kattamuri Ekanadham , Hung Q. Le , Jose E. Moreira , Pratap C. Pattnaik
IPC分类号： G06F9/30 , G06F12/10
CPC分类号： G06F9/3877 , G06F9/30043 , G06F9/3012 , G06F9/30123 , G06F9/3851 , G06F12/1027
摘要： A method, system and computer program product are disclosed for interfacing between a multi-threaded processing core and an accelerator. In one embodiment, the method comprises copying from the processing core to the hardware accelerator memory address translations for each of multiple threads operating on the processing core, and simultaneously storing on the hardware accelerator one or more of the memory address translations for each of the threads. Whenever any one of the multiple threads operating on the processing core instructs the hardware accelerator to perform a specified operation, the hardware accelerator has stored thereon one or more of the memory address translations for the any one of the threads. This facilitates starting that specified operation without memory translation faults. In an embodiment, the copying includes, each time one of the memory address translations is updated on the processing core, copying the updated one of the memory address translations to the hardware accelerator.
摘要翻译：公开了用于在多线程处理核心和加速器之间进行接口的方法，系统和计算机程序产品。在一个实施例中，该方法包括从处理核心复制到在处理核心上操作的多个线程中的每个线程的硬件加速器存储器地址转换，以及同时在硬件加速器上存储每个线程的一个或多个存储器地址转换。只要在处理核心上操作的多个线程中的任何一个指示硬件加速器执行指定的操作，则硬件加速器在其上存储有针对任何一个线程的一个或多个存储器地址转换。这有助于启动指定的操作，而不会出现内存转换错误。在一个实施例中，复制包括每次在处理核心上更新一个存储器地址转换时，将更新的一个存储器地址转换复制到硬件加速器。

26. 发明授权

US07107399B2 Scalable memory 失效
标题翻译：可扩展内存
公开(公告)号：US07107399B2
公开(公告)日：2006-09-12
申请号：US09854213
申请日：2001-05-11
申请人： Gianfranco Bilardi , Kattamuri Ekanadham , Pratap Chandra Pattnaik
发明人： Gianfranco Bilardi , Kattamuri Ekanadham , Pratap Chandra Pattnaik
IPC分类号： G06F12/00
CPC分类号： G11C7/1057 , G06F12/08 , G11C7/10 , G11C7/1006 , G11C7/1051 , G11C7/1078 , G11C7/1084
摘要： A memory structure and method for handling memory requests from a processor and for returning correspondence responses to the processor from various levels of the memory structure. The memory levels of the memory structure are interconnected by a forward and return path with the return path having twice the bandwidth of the forward path. An algorithm is used to determine how many responses are sent from each memory level on the return path to the processor. This algorithm is designed to guarantee a constant bound on the rate of responses sent to the processor. More specifically, if a write request is at the same level to which it is targeted, or if a request at a memory level is targeted to a higher memory level, then two responses are forwarded from a controller at the memory level on the return path to the processor. Otherwise, only one response is forwarded from the memory level on the return path.
摘要翻译：一种用于处理来自处理器的存储器请求并用于从存储器结构的各个级别返回到处理器的对应响应的存储器结构和方法。存储器结构的存储器级别通过前向和返回路径互连，返回路径具有正向路径的两倍的带宽。一种算法用于确定从返回路径上的每个存储器级别发送到处理器的响应数量。该算法旨在保证对发送到处理器的响应速率的恒定限制。更具体地说，如果写入请求处于与其所针对的相同级别，或者如果存储器级别的请求针对更高的存储器级别，则在返回路径上的存储器级的控制器处转发两个响应到处理器。否则，只有一个响应从返回路径上的内存级别转发。

27. 发明授权

US5893922A Home node migration for distributed shared memory systems 失效
标题翻译：分布式共享内存系统的家庭节点迁移
公开(公告)号：US5893922A
公开(公告)日：1999-04-13
申请号：US813814
申请日：1997-03-06
申请人： Sandra Johnson Baylor , Kattamuri Ekanadham , Joefon Jann , Beng-Hong Lim , Pratap Chandra Pattnaik
发明人： Sandra Johnson Baylor , Kattamuri Ekanadham , Joefon Jann , Beng-Hong Lim , Pratap Chandra Pattnaik
IPC分类号： G06F9/50 , G06F12/08 , G06F13/00
CPC分类号： G06F9/5016 , G06F12/0813
摘要： A mechanism to dynamically migrate a home node of a global page to a more suitable node for improving performance of parallel applications running on a S-COMA and other DSM systems. More specifically, consultation counts are maintained at each client node of a shared memory system, where the consultation count indicates the number of times the client node has consulted the dynamic for lines a page. This information is then used along with other information to decide on whether to change the dynamic home node to a more suitable node.
摘要翻译：将全局页面的家庭节点动态迁移到更合适的节点以提高在S-COMA和其他DSM系统上运行的并行应用程序的性能的机制。更具体地，在共享存储器系统的每个客户端节点处维护咨询计数，其中咨询计数指示客户端节点已经查阅了页面的行的动态次数。然后将该信息与其他信息一起使用以决定是否将动态家庭节点更改为更合适的节点。

28. 发明授权

US5347639A Self-parallelizing computer system and method 失效
标题翻译：自并行计算机系统及方法
公开(公告)号：US5347639A
公开(公告)日：1994-09-13
申请号：US731224
申请日：1991-07-15
申请人： Rudolph N. Rechtschaffen , Kattamuri Ekanadham
发明人： Rudolph N. Rechtschaffen , Kattamuri Ekanadham
IPC分类号： G06F9/38 , G06F9/44 , G06F15/16 , G06F15/177 , G06F9/30
CPC分类号： G06F8/45
摘要： A self-parallelizing computer system and method asynchronously processes execution sequences of instructions in two modes of execution on a set of processing elements which communicate with each other. Each processing element is capable of decoding instructions, generating memory operand addresses, executing instructions and referencing and updating its own set of general purpose registers. These processing elements act in concert during the first mode of execution not only to execute the instructions in an execution sequence but also to partition an execution sequence into separate instruction subsequences. The separate instruction subsequences are stored along with additional information which will allow the stored subsequences to be correctly executed in parallel. Subsequent re-execution of the same execution sequence is done much faster in the second mode of execution, since each of the processing elements decodes and executes only the instructions in one of the subsequences while the other processing elements are concurrently each doing the same with another one of the subsequences.
摘要翻译：自并行计算机系统和方法以两种通信方式异步处理两种执行模式的指令执行顺序。每个处理元件能够解码指令，产生存储器操作数地址，执行指令，并引用和更新其自己的通用寄存器集合。这些处理元件在第一执行模式下一致地执行，不仅在执行顺序中执行指令，而且还将执行序列分割成单独的指令子序列。单独的指令子序列与附加信息一起存储，这将允许存储的子序列被并行地正确执行。在第二种执行模式中，相同执行顺序的后续重新执行的速度要快得多，因为每个处理元件仅解码并执行其中一个子序列中的指令，而其他处理元件同时对其他处理元件执行相同操作其中一个子序列。

29. 发明授权

US08527959B2 Binary programmable method for application performance data collection 有权
标题翻译：用于应用性能数据采集的二进制可编程方法
公开(公告)号：US08527959B2
公开(公告)日：2013-09-03
申请号：US11952922
申请日：2007-12-07
申请人： I-Hsin Chung , Kattamuri Ekanadham , David Joseph Klepacki , Simone Sbaraglia , Robert Edward Walkup , Hui-Fang Wen , Hao Yu
发明人： I-Hsin Chung , Kattamuri Ekanadham , David Joseph Klepacki , Simone Sbaraglia , Robert Edward Walkup , Hui-Fang Wen , Hao Yu
IPC分类号： G06F9/44
CPC分类号： G06F11/3466 , G06F8/4441 , G06F2201/865
摘要： A method for application performance data collection includes steps or acts of: customizing a performance tool for collecting application performance data of an application; modifying the application by inserting the performance tool while the application does not need to be rebuilt from the source; executing the application; and collecting the application execution performance data such that only interesting data is collected. Customizing the performance tool proceeds by implementing at least one configurable tracing function that can be programmed by the user; compiling the function(s) into an object file; and inserting the object file into the performance tool using binary instrumentation.
摘要翻译：应用性能数据收集的方法包括以下步骤或动作：定制用于收集应用的应用性能数据的性能工具; 通过插入性能工具来修改应用程序，同时不需要从源重建应用程序; 执行应用程序; 并收集应用执行性能数据，使得只收集有趣的数据。通过实现可由用户编程的至少一个可配置跟踪功能来自定义性能工具; 将函数编译成对象文件; 并使用二进制检测将目标文件插入性能工具。

30. 发明授权

US08490061B2 Profiling application performance according to data structure 失效
标题翻译：根据数据结构分析应用性能
公开(公告)号：US08490061B2
公开(公告)日：2013-07-16
申请号：US12436894
申请日：2009-05-07
申请人： I-Hsin Chung , Guojing Cong , Kattamuri Ekanadham , David Klepacki , Simone Sbaraglia , Hui-Fang Wen
发明人： I-Hsin Chung , Guojing Cong , Kattamuri Ekanadham , David Klepacki , Simone Sbaraglia , Hui-Fang Wen
IPC分类号： G06F9/44 , G06F13/00 , G06F13/28 , G06F9/46
CPC分类号： G06F8/443 , G06F11/3471 , G06F2201/865
摘要： During runtime of a binary program file, streams of instructions are executed and memory references, generated by instrumentation applied to given ones of the instructions that refer to memory locations, are collected. A transformation is performed, based on the executed streams of instructions and the collected memory references, to obtain a table. The table lists memory events of interest for active data structures for each function in the program file. The transformation is performed to translate memory addresses for given ones of the instructions and given ones of the data structures into locations and variable names in a source file corresponding to the binary file. At least the memory events of interest are displayed, and the display is organized so as to correlate the memory events of interest with corresponding ones of the data structures.
摘要翻译：在二进制程序文件的运行期间，执行指令流，并且收集通过应用于指向存储器位置的给定指令的仪器产生的存储器引用。基于所执行的指令流和所收集的存储器引用执行变换以获得表。该表列出了程序文件中每个功能的活动数据结构感兴趣的内存事件。执行转换以将给定的指令的内存地址转换为与二进制文件相对应的源文件中的数据结构中的位置和变量名。至少显示感兴趣的存储器事件，并且显示被组织以使感兴趣的存储器事件与相应的数据结构相关联。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式