会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Automatic kernel migration for heterogeneous cores
    • 异构核心的自动内核迁移
    • US08683468B2
    • 2014-03-25
    • US13108438
    • 2011-05-16
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton ChernoffDz-Ching Ju
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton ChernoffDz-Ching Ju
    • G06F9/46
    • G06F9/4856G06F9/5066
    • A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    • 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。 计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。 编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。 编译器创建一个数据结构,以支持在给定位置移动与执行函数调用相关联的实时值。 操作系统(OS)调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。 响应于接收到满足迁移条件的指示,OS调度器将活动值移动到由数据结构指示的位置,以供第二处理器核心访问,并且将给定位置之后的代​​码调度到第二处理器核心。
    • 3. 发明申请
    • BRANCH REMOVAL BY DATA SHUFFLING
    • 分支由数据取出拆卸
    • US20120331278A1
    • 2012-12-27
    • US13167517
    • 2011-06-23
    • Mauricio BreternitzPatryk KaminskiKeith Lowery
    • Mauricio BreternitzPatryk KaminskiKeith Lowery
    • G06F9/38
    • G06F9/5027G06F8/451G06F9/5044
    • A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.
    • 一种用于通过减少多个分支指令来自动优化处理器中的多个工作单元的并行执行的系统和方法。 计算系统包括具有通用微架构的第一处理器核和具有相同指令多数据(SIMD)微架构的第二处理器核。 编译器使用用于确定一个或多个结果的一个或多个数据记录来检测和评估函数调用中的分支。 生成多个计算子内核,每个子内核包含来自与分支的唯一结果相对应的函数的代码。 通过将与分支的给定结果相对应的数据的一个或多个记录分配给与给定结果相关联的多个计算子核之一来生成多个工作单元。 分支被删除。 操作系统调度器将一个或多个计算子内核中的每一个调度到第一处理器核或第二处理器核。
    • 4. 发明授权
    • System and method for NUMA-aware heap memory management
    • 用于NUMA感知堆内存管理的系统和方法
    • US08245008B2
    • 2012-08-14
    • US12372839
    • 2009-02-18
    • Patryk KaminskiKeith Lowery
    • Patryk KaminskiKeith Lowery
    • G06F13/14
    • G06F12/023G06F12/0284G06F2212/2542
    • A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.
    • 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问(NUMA)计算机系统上向多线程程序分配存储器的系统和方法。 在实施例中,NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。 堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。 如果可能,堆管理器可能会尝试在与请求线程相同的节点上分配内存。 堆管理器可以是非应用特定的,可以采用多级的空闲块缓存,和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。
    • 5. 发明申请
    • AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES
    • 自动负载平衡异常角
    • US20120291040A1
    • 2012-11-15
    • US13105250
    • 2011-05-11
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton Chernoff
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton Chernoff
    • G06F9/46
    • G06F9/5083
    • A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.
    • 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。 处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。 计算机程序包括一个或多个计算内核或函数调用。 编译器计算给定函数调用的运行前信息。 运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。 至少部分地基于所计算的运行前信息,调度器将工作单元分配给第一或第二处理器核。 此外,调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。
    • 6. 发明授权
    • Automatic load balancing for heterogeneous cores
    • 异构核心的自动负载平衡
    • US08782645B2
    • 2014-07-15
    • US13105250
    • 2011-05-11
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton Chernoff
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton Chernoff
    • G06F9/46G06F9/50
    • G06F9/5083
    • A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.
    • 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。 处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。 计算机程序包括一个或多个计算内核或函数调用。 编译器计算给定函数调用的运行前信息。 运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。 至少部分地基于所计算的运行前信息,调度器将工作单元分配给第一或第二处理器核。 此外,调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。
    • 7. 发明申请
    • AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES
    • 自动KERNEL移动异构牙
    • US20120297163A1
    • 2012-11-22
    • US13108438
    • 2011-05-16
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton ChernoffDz-Ching Ju
    • Mauricio BreternitzPatryk KaminskiKeith LoweryAnton ChernoffDz-Ching Ju
    • G06F9/315G06F15/80
    • G06F9/4856G06F9/5066
    • A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    • 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。 计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。 编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。 编译器创建一个数据结构,以支持在给定位置移动与执行函数调用相关联的实时值。 操作系统(OS)调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。 响应于接收到满足迁移条件的指示,OS调度器将活动值移动到由数据结构指示的位置,以供第二处理器核心访问,并且将给定位置之后的代​​码调度到第二处理器核心。
    • 8. 发明申请
    • System and Method for NUMA-Aware Heap Memory Management
    • NUMA感知堆内存管理的系统和方法
    • US20100211756A1
    • 2010-08-19
    • US12372839
    • 2009-02-18
    • Patryk KaminskiKeith Lowery
    • Patryk KaminskiKeith Lowery
    • G06F12/00
    • G06F12/023G06F12/0284G06F2212/2542
    • A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.
    • 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问(NUMA)计算机系统上向多线程程序分配存储器的系统和方法。 在实施例中,NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。 堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。 如果可能,堆管理器可能会尝试在与请求线程相同的节点上分配内存。 堆管理器可以是非应用特定的,可以采用多级的空闲块缓存,和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。