会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Performing high granularity prefetch from remote memory into a cache on a device without change in address
    • 从远程内存执行高粒度预取到设备上的缓存,而不改变地址
    • US08549231B2
    • 2013-10-01
    • US12684689
    • 2010-01-08
    • Rabin A. SugumarBjørn Dag JohnsenBen Sum
    • Rabin A. SugumarBjørn Dag JohnsenBen Sum
    • G06F12/08
    • G06F12/0862G06F12/1081
    • Provided is a method, which may be performed on a computer, for prefetching data over an interface. The method may include receiving a first data prefetch request for first data of a first data size stored at a first physical address corresponding to a first virtual address. The first data prefetch request may include second data specifying the first virtual address and third data specifying the first data size. The first virtual address and the first data size may define a first virtual address range. The method may also include converting the first data prefetch request into a first data retrieval request. To convert the first data prefetch request into a first data retrieval request the first virtual address specified by the second data may be translated into the first physical address. The method may further include issuing the first data retrieval request at the interface, receiving the first data at the interface and storing at least a portion of the received first data in a cache. Storing may include setting each of one or more cache tags associated with the at least a portion of the received first data to correspond to the first physical address.
    • 提供了一种可以在计算机上执行以用于通过接口预取数据的方法。 该方法可以包括:接收对与第一虚拟地址相对应的第一物理地址处存储的第一数据大小的第一数据的第一数据预取请求。 第一数据预取请求可以包括指定第一虚拟地址的第二数据和指定第一数据大小的第三数据。 第一虚拟地址和第一数据大小可以定义第一虚拟地址范围。 该方法还可以包括将第一数据预取请求转换为第一数据检索请求。 为了将第一数据预取请求转换为第一数据检索请求,由第二数据指定的第一虚拟地址可以被转换为第一物理地址。 该方法还可以包括在接口处发布第一数据检索请求,在接口处接收第一数据并将所接收的第一数据的至少一部分存储在高速缓存中。 存储可以包括将与接收到的第一数据的至少一部分相关联的一个或多个缓存标签中的每一个设置为对应于第一物理地址。
    • 2. 发明授权
    • Network use of virtual addresses without pinning or registration
    • 网络使用虚拟地址,无需固定或注册
    • US08234407B2
    • 2012-07-31
    • US12495805
    • 2009-06-30
    • Rabin A. SugumarRobert W. WittoschBjørn Dag JohnsenWilliam M. Ortega
    • Rabin A. SugumarRobert W. WittoschBjørn Dag JohnsenWilliam M. Ortega
    • G06F15/16G06F13/36G06F12/00
    • G06F12/1027G06F12/1081
    • A system comprising a compute node and coupled network adapter (NA) that allows the NA to directly use CPU virtual addresses without pinning pages in system memory. The NA performs memory accesses in response to requests from various sources. Each request source is assigned to context. Each context has a descriptor that controls the address translation performed by the NA. When the CPU wants to update translation information it sends a synchronization request to the NA that causes the NA to stop fetching a category of requests associated with the information update. The category may be requests associated with a context or a page address. Once the NA determines that all the fetched requests in the category have completed it notifies the CPU and the CPU performs the information update. Once the update is complete, the CPU clears the synchronization request and the NA starts fetching requests in the category.
    • 一种包括计算节点和耦合网络适配器(NA)的系统,其允许NA直接使用CPU虚拟地址而不在系统存储器中固定页面。 NA响应来自各种来源的请求,执行存储器访问。 每个请求源被分配给上下文。 每个上下文都有一个描述符,用于控制由NA执行的地址转换。 当CPU要更新翻译信息时,它向NA发送同步请求,导致NA停止获取与信息更新相关联的一类请求。 类别可以是与上下文或页面地址相关联的请求。 一旦NA确定类别中的所有获取的请求已经完成,它通知CPU并且CPU执行信息更新。 更新完成后,CPU将清除同步请求,NA将开始获取该类别中的请求。
    • 3. 发明申请
    • Performing High Granularity Prefetch from Remote Memory into a Cache on a Device without Change in Address
    • 从远程内存执行高级粒度预取到设备上的缓存,而不改变地址
    • US20110173396A1
    • 2011-07-14
    • US12684689
    • 2010-01-08
    • Rabin A. SugumarBjorn Dag JohnsenBen Sum
    • Rabin A. SugumarBjorn Dag JohnsenBen Sum
    • G06F12/10G06F12/00G06F12/08
    • G06F12/0862G06F12/1081
    • Provided is a method, which may be performed on a computer, for prefetching data over an interface. The method may include receiving a first data prefetch request for first data of a first data size stored at a first physical address corresponding to a first virtual address. The first data prefetch request may include second data specifying the first virtual address and third data specifying the first data size. The first virtual address and the first data size may define a first virtual address range. The method may also include converting the first data prefetch request into a first data retrieval request. To convert the first data prefetch request into a first data retrieval request the first virtual address specified by the second data may be translated into the first physical address. The method may further include issuing the first data retrieval request at the interface, receiving the first data at the interface and storing at least a portion of the received first data in a cache. Storing may include setting each of one or more cache tags associated with the at least a portion of the received first data to correspond to the first physical address.
    • 提供了一种可以在计算机上执行以用于通过接口预取数据的方法。 该方法可以包括:接收对与第一虚拟地址相对应的第一物理地址处存储的第一数据大小的第一数据的第一数据预取请求。 第一数据预取请求可以包括指定第一虚拟地址的第二数据和指定第一数据大小的第三数据。 第一虚拟地址和第一数据大小可以定义第一虚拟地址范围。 该方法还可以包括将第一数据预取请求转换为第一数据检索请求。 为了将第一数据预取请求转换为第一数据检索请求,由第二数据指定的第一虚拟地址可以被转换为第一物理地址。 该方法还可以包括在接口处发布第一数据检索请求,在接口处接收第一数据并将所接收的第一数据的至少一部分存储在高速缓存中。 存储可以包括将与接收到的第一数据的至少一部分相关联的一个或多个缓存标签中的每一个设置为对应于第一物理地址。
    • 4. 发明申请
    • Scalable Interface for Connecting Multiple Computer Systems Which Performs Parallel MPI Header Matching
    • 用于连接执行并行MPI头匹配的多个计算机系统的可扩展接口
    • US20100232448A1
    • 2010-09-16
    • US12402804
    • 2009-03-12
    • Rabin A. SugumarLars Paul HuseBjorn Dag Johnsen
    • Rabin A. SugumarLars Paul HuseBjorn Dag Johnsen
    • H04L12/56H04L12/66
    • G06F15/17337
    • An interface device for a compute node in a computer cluster which performs Message Passing Interface (MPI) header matching using parallel matching units. The interface device comprises a memory that stores posted receive queues and unexpected queues. The posted receive queues store receive requests from a process executing on the compute node. The unexpected queues store headers of send requests (e.g., from other compute nodes) that do not have a matching receive request in the posted receive queues. The interface device also comprises a plurality of hardware pipelined matcher units. The matcher units perform header matching to determine if a header in the send request matches any headers in any of the plurality of posted receive queues. Matcher units perform the header matching in parallel. In other words, the plural matching units are configured to search the memory concurrently to perform header matching.
    • 用于计算机集群中的计算节点的接口设备,其使用并行匹配单元执行消息传递接口(MPI)报头匹配。 接口设备包括存储发布的接收队列和意外队列的存储器。 发布的接收队列存储在计算节点上执行的进程的接收请求。 意外的队列存储在发布的接收队列中不具有匹配的接收请求的发送请求(例如来自其他计算节点)的头部。 接口设备还包括多个硬件流水线匹配器单元。 匹配器单元执行报头匹配以确定发送请求中的报头是否匹配多个发布的接收队列中的任何一个中的任何报头。 匹配器单元并行执行头匹配。 换句话说,多个匹配单元被配置为同时搜索​​存储器以执行头匹配。
    • 5. 发明授权
    • Branch prediction structure with branch direction entries that share branch prediction qualifier entries
    • 具有共享分支预测限定符条目的分支方向条目的分支预测结构
    • US07380110B1
    • 2008-05-27
    • US10660169
    • 2003-09-11
    • Robert D. NuckollsRabin A. SugumarChandra M. R. Thimmannagari
    • Robert D. NuckollsRabin A. SugumarChandra M. R. Thimmannagari
    • G06F9/40G06F9/44
    • G06F9/3848
    • An efficient branch prediction structure is described that bifurcates a branch prediction structure into at least two portions where information stored in the second portion is aliased amongst multiple entries of the first portion. In this way, overall storage (and layout area) can be reduced and scaling with a branch prediction structure that includes a (2N)K×1 branch direction entries and a (N/2)K×1 branch prediction qualifier entries is less dramatic than conventional techniques. An efficient branch prediction structure includes entries for branch direction indications and entries for branch prediction qualifier indications. The branch direction indication entries are more numerous than the branch prediction qualifier entries. An entry from the branch direction entries is selected based at least in part on a corresponding instruction instance identifier and an entry from the branch prediction qualifier entries is selected based at least in part on least significant bits of the instruction instance identifier.
    • 描述了一种有效的分支预测结构,其将分支预测结构分成至少两个部分,其中存储在第二部分中的信息在第一部分的多个条目之中进行混叠。 以这种方式,可以减少总体存储(和布局面积),并且使用包括(2N)Kx1分支方向条目和(N / 2)Kx1分支预测限定符条目的分支预测结构进行缩放比常规技术更不显着。 有效的分支预测结构包括用于分支方向指示的条目和用于分支预测限定符指示的条目。 分支方向指示条目比分支预测限定符条目更多。 至少部分地基于对应的指令实例标识符来选择来自分支方向条目的条目,并且至少部分地基于指令实例标识符的最低有效位来选择来自分支预测限定符条目的条目。
    • 6. 发明授权
    • Thread switch circuit design and signal encoding for vertical threading
    • 线程开关电路设计和垂直线程信号编码
    • US07120915B1
    • 2006-10-10
    • US09716545
    • 2000-11-20
    • Gajendra P. SinghJoseph I. ChamdaniRenu RamanRabin A. Sugumar
    • Gajendra P. SinghJoseph I. ChamdaniRenu RamanRabin A. Sugumar
    • G06F9/46G06F9/40G06F9/44
    • G06F9/3851G06F9/3869
    • A method and apparatus for implementing vertical multi-threading in a microprocessor without implementing additional signal wires in the processor has been developed. The method uses a pre-existing signal to serve as a multi-function signal such that the multi-function signal can be used for clock enable, clock disable, and scan enable functions. The single multi-function signal exhibits multiple functionalities as needed by a flip-flop to operate in a plurality of modes. The method allows for the use of a pre-existing signal wire to be used as a process thread switch signal that would otherwise have to be explicitly hard-wired in the absence of the multi-functioning signal. The method further includes allowing multiple-bit flip-flops to be placed at sequential stages in a pipeline in order to facilitate vertical multi-threading and, in effect, increase processor performance. The apparatus provides means for distinguishing between specific characteristics exhibited by the multi-function signal. The apparatus further provides means for generating intermediary signals within a control block and then generating output signals to a data storage block. The apparatus also involves generating timing signals to a plurality of flip-flops dependent upon the behavior of the multi-function signal.
    • 已经开发了用于在微处理器中实现垂直多线程而不在处理器中实现附加信号线的方法和装置。 该方法使用预先存在的信号作为多功能信号,使得多功能信号可用于时钟使能,时钟禁止和扫描使能功能。 单个多功能信号根据触发器的需要显示多种功能,以在多种模式下工作。 该方法允许使用预先存在的信号线作为处理线程切换信号,否则在不存在多功能信号的情况下,该信号线将不得不被明确地硬接线。 该方法还包括允许将多位触发器放置在流水线中的连续阶段,以便于垂直多线程,并且实际上增加处理器性能。 该装置提供用于区分由多功能信号表现的特定特征的装置。 该装置还提供用于在控制块内产生中间信号,然后产生到数据存储块的输出信号的装置。 该装置还涉及根据多功能信号的行为产生定时信号给多个触发器。
    • 7. 发明申请
    • METHOD AND SYSTEM FOR OFFLOADING COMPUTATION FLEXIBLY TO A COMMUNICATION ADAPTER
    • 将通信适配器灵活运算的方法和系统
    • US20130007181A1
    • 2013-01-03
    • US13173473
    • 2011-06-30
    • Rabin A. SugumarDavid Brower
    • Rabin A. SugumarDavid Brower
    • G06F15/167
    • G06F9/5027G06F2209/509
    • A method for offloading computation flexibly to a communication adapter includes receiving a message that includes a procedure image identifier associated with a procedure image of a host application, determining a procedure image and a communication adapter processor using the procedure image identifier, and forwarding the first message to the communication adapter processor configured to execute the procedure image. The method further includes executing, on the communication adapter processor independent of a host processor, the procedure image in communication adapter memory by acquiring a host memory latch for a memory block in host memory, reading the memory block in the host memory after acquiring the host memory latch, manipulating, by executing the procedure image, the memory block in the communication adapter memory to obtain a modified memory block, committing the modified memory block to the host memory, and releasing the host memory latch.
    • 一种用于将计算灵活地卸载到通信适配器的方法包括接收包括与主机应用程序的过程映像相关联的过程映像标识符的消息,使用过程映像标识符确定过程映像和通信适配器处理器,以及转发第一消息 配置为执行过程映像的通信适配器处理器。 该方法还包括通过获取主机存储器中的存储器块的主机存储器锁存器来在独立于主处理器的通信适配器处理器上执行通信适配器存储器中的过程映像,在获取主机之后读取主机存储器中的存储器块 存储器锁存器,通过执行过程映像来操纵通信适配器存储器中的存储块,以获得修改的存储器块,将修改的存储器块提交到主机存储器,以及释放主机存储器锁存器。
    • 8. 发明申请
    • Scalable Interface for Connecting Multiple Computer Systems Which Performs Parallel MPI Header Matching
    • 用于连接执行并行MPI头匹配的多个计算机系统的可扩展接口
    • US20120243542A1
    • 2012-09-27
    • US13489496
    • 2012-06-06
    • Rabin A. SugumarLars Paul HuseBjørn Dag Johnsen
    • Rabin A. SugumarLars Paul HuseBjørn Dag Johnsen
    • H04L12/56
    • G06F15/17337
    • An interface device for a compute node in a computer cluster which performs Message Passing Interface (MPI) header matching using parallel matching units. The interface device comprises a memory that stores posted receive queues and unexpected queues. The posted receive queues store receive requests from a process executing on the compute node. The unexpected queues store headers of send requests (e.g., from other compute nodes) that do not have a matching receive request in the posted receive queues. The interface device also comprises a plurality of hardware pipelined matcher units. The matcher units perform header matching to determine if a header in the send request matches any headers in any of the plurality of posted receive queues. Matcher units perform the header matching in parallel. In other words, the plural matching units are configured to search the memory concurrently to perform header matching.
    • 用于计算机集群中的计算节点的接口设备,其使用并行匹配单元执行消息传递接口(MPI)报头匹配。 接口设备包括存储发布的接收队列和意外队列的存储器。 发布的接收队列存储在计算节点上执行的进程的接收请求。 意外队列存储在发布的接收队列中不具有匹配的接收请求的发送请求(例如来自其他计算节点)的头部。 接口设备还包括多个硬件流水线匹配器单元。 匹配器单元执行报头匹配以确定发送请求中的报头是否匹配多个发布的接收队列中的任何一个中的任何报头。 匹配器单元并行执行头匹配。 换句话说,多个匹配单元被配置为同时搜索​​存储器以执行头匹配。
    • 10. 发明申请
    • Software Aware Throttle Based Flow Control
    • 软件感知基于节气门的流量控制
    • US20100332676A1
    • 2010-12-30
    • US12495452
    • 2009-06-30
    • Rabin A. SugumarBjørn Dag JohnsenLars Paul HuseWilliam M. Ortega
    • Rabin A. SugumarBjørn Dag JohnsenLars Paul HuseWilliam M. Ortega
    • G06F15/16
    • H04L41/065H04L47/10H04L47/26H04L47/283H04L47/30H04L49/00H04L49/90
    • A system, comprising a compute node and coupled network adapter (NA), that supports improved data transfer request buffering and a more efficient method of determining the completion status of data transfer requests. Transfer requests received by the NA are stored in a first buffer then transmitted on a network interface. When significant network delays are detected and the first buffer is full, the NA sets a flag to stop software issuing transfer requests. Compliant software checks this flag before sending requests and does not issue further requests. A second NA buffer stores additional received transfer requests that were perhaps in-transit. When conditions improve the flag is cleared and the first buffer used again. Completion status is efficiently determined by grouping network transfer requests. The NA counts received requests and completed network requests for each group. Software determines if a group of requests is complete by reading a count value.
    • 一种包括计算节点和耦合网络适配器(NA)的系统,其支持改进的数据传输请求缓冲以及确定数据传输请求的完成状态的更有效的方法。 由NA接收的传送请求存储在第一缓冲器中,然后在网络接口上发送。 当检测到显着的网络延迟并且第一个缓冲区已满时,NA设置一个标志,以停止发布传输请求的软件。 合规软件在发送请求之前检查此标志,并且不会发出进一步的请求。 第二个NA缓冲存储器可以存储可能在运输过程中的其他接收的传输请求。 当条件改善时,标志被清除,第一个缓冲区再次使用。 通过分组网络传输请求有效地确定完成状态。 NA计数接收到的请求并为每个组完成网络请求。 软件通过读取计数值来确定一组请求是否完成。