专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06851071B2 Apparatus and method of repairing a processor array for a failure detected at runtime 失效
标题翻译：修复在运行时检测到的故障的处理器阵列的装置和方法
公开(公告)号：US06851071B2
公开(公告)日：2005-02-01
申请号：US09974967
申请日：2001-10-11
申请人： Douglas Craig Bossen , Daniel James Henderson , Raymond Leslie Hicks , Alongkorn Kitamorn , David Otto Lewis , Thomas Alan Liebsch
发明人： Douglas Craig Bossen , Daniel James Henderson , Raymond Leslie Hicks , Alongkorn Kitamorn , David Otto Lewis , Thomas Alan Liebsch
IPC分类号： G06F11/00 , G06F11/10 , G06F11/14 , G06F11/16
CPC分类号： G06F11/1064 , G06F11/076 , G06F11/0772 , G06F11/079 , G06F11/0793 , G06F11/142 , G06F11/1425 , G06F2201/81 , G11C2029/0401 , G11C2029/0409
摘要： An apparatus and method of repairing a processor array for a failure detected at runtime in a system supporting persistent component deallocation are provided. The apparatus and method of the present invention allow redundant array bits to be used for recoverable faults detected in arrays during run time, instead of only at system boot, while still maintaining the dynamic and persistent processor deallocation features of the computing system. With the apparatus and method of the present invention, a failure of a cache array is detected and a determination is made as to whether a repairable failure threshold is exceeded during runtime. If this threshold is exceeded, a determination is made as to whether cache array redundancy may be applied to correct the failure, i.e. a bit error. If so, the cache array redundancy is applied without marking the processor as unavailable. At some time later, the system undergoes a re-initial program load (re-IPL) at which time it is determined whether a second failure of the processor occurs. If a second failure occurs, a determination is made as to whether any status bits are set for arrays other than the cache array that experienced the present failure, if so, the processor is marked unavailable. If not, a determination is made as to whether cache redundancy can be applied to correct the failure. If so, the failure is corrected using the cache redundancy. If not, the processor is marked unavailable.
摘要翻译：提供了一种用于在支持持久性组件分配的系统中在运行时检测到的故障的处理器阵列的修复的装置和方法。本发明的装置和方法允许冗余阵列位用于在运行时间期间在阵列中检测到的可恢复故障，而不是仅在系统引导时，同时仍维持计算系统的动态和持久处理器释放特征。利用本发明的装置和方法，检测到高速缓存阵列的故障，并且确定在运行时期间是否超过了可修复的故障阈值。如果超过该阈值，则确定是否应用高速缓存阵列冗余来校正故障，即位错误。如果是这样，则应用缓存阵列冗余，而不会将处理器标记为不可用。在稍后的一段时间内，系统经历重新启动程序加载（re-IPL），此时确定处理器是否发生第二个故障。如果发生第二个故障，则确定是否为经历当前故障的高速缓存阵列之外的阵列设置了任何状态位，否则，处理器被标记为不可用。如果不是，则确定是否可以应用高速缓存冗余来校正故障。如果是这样，则使用高速缓存冗余来校正故障。如果没有，则处理器被标记为不可用。

2. 发明授权

US06789048B2 Method, apparatus, and computer program product for deconfiguring a processor 有权
标题翻译：用于解除配置处理器的方法，装置和计算机程序产品
公开(公告)号：US06789048B2
公开(公告)日：2004-09-07
申请号：US10116626
申请日：2002-04-04
申请人： Richard Louis Arndt , Douglas Marvin Benignus , Douglas Craig Bossen , Daniel James Henderson , Alongkorn Kitamorn
发明人： Richard Louis Arndt , Douglas Marvin Benignus , Douglas Craig Bossen , Daniel James Henderson , Alongkorn Kitamorn
IPC分类号： G06F1130
CPC分类号： G06F11/2236
摘要： According to a method form of the invention, in a computer system having a processing load distributed among a number of processors in the system, test computations are performed at intervals by floating point logic of a processor responsive to stored test instructions. Responsive to the test computations indicating an erroneous result by one of the processors information is passed by a firmware process and entered into an operating system error log. Responsive to the information, an operating system deconfiguration service is notified of the error log entry, and the service deconfigures the indicated processor, while the system is still running.
摘要翻译：根据本发明的方法形式，在具有分布在系统中的多个处理器之间的处理负载的计算机系统中，响应于存储的测试指令，处理器的浮点逻辑以间隔执行测试计算。响应于指示处理器信息之一的错误结果的测试计算由固件处理传递并输入到操作系统错误日志中。响应于该信息，操作系统解除配置服务被通知错误日志条目，并且服务在系统仍在运行时取消指定处理器的配置。

3. 发明授权

US06233680B1 Method and system for boot-time deconfiguration of a processor in a symmetrical multi-processing system 失效
标题翻译：用于对称多处理系统中处理器引导时解体的方法和系统
公开(公告)号：US06233680B1
公开(公告)日：2001-05-15
申请号：US09165952
申请日：1998-10-02
申请人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin
发明人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin
IPC分类号： G06F15177
CPC分类号： G06F11/0772 , G06F11/0724 , G06F11/079 , G06F11/0793 , G06F15/177
摘要： A method and system for deconfiguring a CPU in a processing system is disclosed. In one aspect, a processing system is disclosed that comprises a central processing unit (CPU), and a memory coupled to the CPU. The error status register for capturing information concerning the status of the CPU. The processing system includes a service processor for gathering and analyzing status information from the CPU error register. The processing system also includes a nonvolatile device coupled to the service processor. The nonvolatile device includes a deconfiguration area. The deconfiguration area stores information concerning the status of the CPU from the service processor. The deconfiguration area also provides information for deconfiguring a CPU during a boot time of the processing system. Accordingly, through the present invention, CPU errors are detected during normal computer operations by error detection logic. This detection is utilized during any subsequent boot process by service processor firmware to deallocate the defective CPU. This is accomplished through the use of error status registers within the CPU and through the use of a deconfiguration area in the nonvolatile device which provides information directly to the service processor.
摘要翻译：公开了一种用于在处理系统中对CPU进行解配置的方法和系统。在一个方面，公开了一种包括中央处理单元（CPU）和耦合到CPU的存储器的处理系统。用于捕获有关CPU状态的信息的错误状态寄存器。处理系统包括用于从CPU错误寄存器收集和分析状态信息的服务处理器。处理系统还包括耦合到服务处理器的非易失性设备。非易失性器件包括解配置区域。解除配置区域从服务处理器存储关于CPU的状态的信息。解除配置区域还提供了在处理系统的引导时间期间对CPU进行解除配置的信息。因此，通过本发明，通过错误检测逻辑在通常的计算机操作期间检测到CPU错误。这种检测在服务处理器固件的任何后续启动过程中被利用以释放有缺陷的CPU。这是通过使用CPU内的错误状态寄存器并通过使用非易失性设备中的解配置区来实现的，该非配置区域直接向服务处理器提供信息。

4. 发明授权

US06516429B1 Method and apparatus for run-time deconfiguration of a processor in a symmetrical multi-processing system 失效
标题翻译：用于对称多处理系统中的处理器的运行时解配置的方法和装置
公开(公告)号：US06516429B1
公开(公告)日：2003-02-04
申请号：US09434767
申请日：1999-11-04
申请人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin , John Thomas O'Quin, II
发明人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin , John Thomas O'Quin, II
IPC分类号： G06F1100
CPC分类号： G06F11/2035 , G06F11/004 , G06F11/0724 , G06F11/076 , G06F11/079 , G06F11/2023 , G06F11/2028 , G06F2212/1032
摘要： A method and apparatus in a multiprocessor data processing system for managing a plurality of processors. Monitoring for recoverable errors in a set of processors is performed. Responsive to detecting a recoverable error for a processor in the set of processors, a determination is made as to whether the recoverable error indicates a trend towards an unrecoverable error. Responsive to a determination that the recoverable error indicates a trend towards an unrecoverable error, actions are initiated to stop the processor.
摘要翻译：一种用于管理多个处理器的多处理器数据处理系统中的方法和装置。执行一组处理器中的可恢复错误的监视。响应于检测处理器集合中的处理器的可恢复误差，确定可恢复误差是否表示朝向不可恢复误差的趋势。响应于确定可恢复错误指示出不可恢复的错误的趋势，启动动作以停止处理器。

5. 发明授权

US06243823B1 Method and system for boot-time deconfiguration of a memory in a processing system 失效
标题翻译：用于处理系统中存储器引导时解配置的方法和系统
公开(公告)号：US06243823B1
公开(公告)日：2001-06-05
申请号：US09165955
申请日：1998-10-02
申请人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin
发明人： Douglas Craig Bossen , Alongkorn Kitamorn , Charles Andrew McLaughlin
IPC分类号： G06F15177
CPC分类号： G06F11/142
摘要： A method and system for deconfiguring software in a processing system is disclosed. In one aspect, a processing system comprises a central processing unit (CPU), and a memory coupled to the CPU. The memory includes a memory array and a memory controller for capturing information concerning the status of the memory array. The processing system includes a service processor for gathering and analyzing status information from the memory controller. The processing system also includes a nonvolatile device coupled to the CPU and the service processor. The nonvolatile device includes a deconfiguration area. The deconfiguration area stores information concerning the status of the memory array from the service processor. The deconfiguration area also provides information for deconfiguring at least a portion of the memory array during a boot time of the processing system. Accordingly, through the present invention, memory errors are detected during normal computer operations by error detection logic. This detection is utilized during any subsequent boot process by service processor and CPU boot firmware to deallocate the defective memory module. This is accomplished through the use of error status registers within the memory controller and through the use of a deconfiguration area in the nonvolatile device which provides information directly to the CPU boot firmware.
摘要翻译：公开了一种在处理系统中解除配置软件的方法和系统。在一个方面，处理系统包括中央处理单元（CPU）和耦合到CPU的存储器。存储器包括存储器阵列和用于捕获关于存储器阵列的状态的信息的存储器控制器。处理系统包括用于从存储器控制器收集和分析状态信息的服务处理器。处理系统还包括耦合到CPU和服务处理器的非易失性设备。非易失性器件包括解配置区域。解除配置区域从服务处理器存储关于存储器阵列的状态的信息。解配置区域还提供用于在处理系统的引导时间期间解除配置存储器阵列的至少一部分的信息。因此，通过本发明，通过错误检测逻辑在正常的计算机操作期间检测存储器错误。在任何后续引导过程中，服务处理器和CPU引导固件都会使用该检测来取消分配有缺陷的内存模块。这是通过使用存储器控制器内的错误状态寄存器并且通过使用非易失性设备中的解除配置区域来实现的，该非配置区域直接向CPU引导固件提供信息。

6. 发明授权

US06332181B1 Recovery mechanism for L1 data cache parity errors 失效
标题翻译： L1数据缓存奇偶校验错误的恢复机制
公开(公告)号：US06332181B1
公开(公告)日：2001-12-18
申请号：US09072324
申请日：1998-05-04
申请人： Douglas Craig Bossen , Kevin Arthur Chiarot , Namratha Rajasekharaiah Jaisimha , Avijit Saha
发明人： Douglas Craig Bossen , Kevin Arthur Chiarot , Namratha Rajasekharaiah Jaisimha , Avijit Saha
IPC分类号： G06F1208
CPC分类号： G06F11/073 , G06F11/0793 , G06F11/1044 , G06F12/0802
摘要： A method of handling a cache error (such as a parity error), which allows a software recovery, by reporting the error using an unrelated system resource, such as an interrupt service, and particularly a data storage interrupt. The parity error can be reported by generating a data storage interrupt and using the data storage interrupt status register (DSISR) to indicate that the data storage interrupt is a result of the parity error. The context of the processor can be fully synchronized while handling the parity error.
摘要翻译：通过使用诸如中断服务之类的不相关的系统资源（特别是数据存储中断）来报告错误来处理允许软件恢复的高速缓存错误（例如奇偶校验错误）的方法。可以通过产生数据存储中断并使用数据存储中断状态寄存器（DSISR）来指示数据存储中断是奇偶校验错误的结果来报告奇偶校验错误。处理器的上下文可以在处理奇偶校验错误的同时完全同步。

7. 发明授权

US5682394A Fault tolerant computer memory systems and components employing dual level error correction and detection with disablement feature 失效
标题翻译：容错计算机存储器系统和采用双级错误校正和检测功能的组件
公开(公告)号：US5682394A
公开(公告)日：1997-10-28
申请号：US012186
申请日：1993-02-02
申请人： Robert Martin Blake , Douglas Craig Bossen , Chin-Long Chen , John Atkinson Fifield , Howard Leo Kalter
发明人： Robert Martin Blake , Douglas Craig Bossen , Chin-Long Chen , John Atkinson Fifield , Howard Leo Kalter
IPC分类号： G06F11/00 , G06F11/10
CPC分类号： G06F11/1052 , G06F11/1008
摘要： In a memory system comprising a plurality of memory units each of which possesses unit-level error correction capabilities and each of which is tied to a system level error correction function, memory reliability is enhanced by providing a mechanism for disabling the unit-level error correction capability, for example, in response to the occurrence of an uncorrectable error in one of the memory units. This counter-intuitive approach which disables an error correction function nonetheless enhances overall memory system reliability since it enables the employment of the complement/recomplement algorithm which depends upon the presence of reproducible errors for proper operation. Thus, chip level error correction systems, which are increasingly desirable at high packaging densities, are employed in a way which does not interfere with system level error correction methods.
摘要翻译：在包括多个存储器单元的存储器系统中，每个存储器单元具有单位级错误校正能力，并且每个都与系统级错误校正功能相关联，通过提供用于禁用单元级错误校正的机制来增强存储器的可靠性能力，例如，响应于在一个存储器单元中发生不可校正的错误。这种禁用纠错功能的反直觉方法仍然提高了整体存储系统的可靠性，因为它可以使用补充/重新补充算法，这取决于是否存在可重复的错误以进行正确的操作。因此，在高封装密度下越来越需要的芯片级误差校正系统采用不干扰系统级误差校正方法的方式。

8. 发明授权

US06223299B1 Enhanced error handling for I/O load/store operations to a PCI device via bad parity or zero byte enables 失效
标题翻译：通过坏的奇偶校验或零字节使I / O加载/存储操作到PCI设备的增强的错误处理能够实现
公开(公告)号：US06223299B1
公开(公告)日：2001-04-24
申请号：US09072418
申请日：1998-05-04
申请人： Douglas Craig Bossen , Charles Andrew McLaughlin , Danny Marvin Neal , James Otto Nicholson , Steven Mark Thurber
发明人： Douglas Craig Bossen , Charles Andrew McLaughlin , Danny Marvin Neal , James Otto Nicholson , Steven Mark Thurber
IPC分类号： G06F1100
CPC分类号： G06F11/0772 , G06F11/0745 , G06F11/0793
摘要： Device selects lines from each I/O device are brought into a PCI host bridge individually so that the device number of a failing device may be logged in an error register when an error is seen on the PCI bus. Until the error register is reset, subsequent load and store operations are delayed until the device number of the subject device may be checked against the error register. If the subject device is a previously failing device, the load/store operation to that device is prevented from completing, either by forcing bad parity or zeroing all byte enables. By forcing bad parity of zero byte enables, the I/O device will respond to the load or store request by activating its device select line, but will not accept store data. Operations to devices which are not logged in the error register are permitted to proceed normally, as are all load store operations when the error register is clear. Normal system operations are thus not impacted, and operations during error recovery are permitted to proceed if no further damage will be caused by such operations.
摘要翻译：设备选择每个I / O设备的线路分别插入PCI主机桥，以便在PCI总线上出现错误时，可能会将故障设备的设备号记录在错误寄存器中。在错误寄存器复位之前，后续的加载和存储操作将被延迟，直到可以针对错误寄存器检查主体设备的设备编号。如果主机设备是先前发生故障的设备，则通过强制坏的奇偶校验或归零所有字节使能来防止对该设备的加载/存储操作完成。通过强制零字节的不良奇偶使能，I / O设备将通过激活其设备选择行来响应加载或存储请求，但不接受存储数据。允许对未登录在错误寄存器中的设备进行操作，正常情况下，正常情况下进行加载存储操作。因此，正常的系统操作不会受到影响，并且如果这种操作不会造成进一步的损坏，则允许错误恢复期间的操作进行。

9. 发明授权

US5978936A Run time error probe in a network computing environment 失效
标题翻译：在网络计算环境中运行时错误探测器
公开(公告)号：US5978936A
公开(公告)日：1999-11-02
申请号：US974574
申请日：1997-11-19
申请人： Arun Chandra , Douglas Craig Bossen , Nandakumar Nityananda Tendolkar
发明人： Arun Chandra , Douglas Craig Bossen , Nandakumar Nityananda Tendolkar
IPC分类号： G06F11/277 , H04L12/26 , G06F12/14
CPC分类号： H04L43/50 , G06F11/277 , H04L12/2697
摘要： A first set of test instructions are provided for a first node in a computer network. A corresponding second set is provided for a second node in the network. The test instruction sets are partitioned into modules. The nodes process their respective sets of test instructions independently to generate test results for each module on each node, except when a synchronizing event occurs. Each node stores its test results for each test module. Since the test modules have an ordered processing sequence, each node's test results for corresponding test modules can be compared asynchronously on an ongoing basis.
摘要翻译：为计算机网络中的第一节点提供第一组测试指令。为网络中的第二节点提供相应的第二集合。测试指令集被分为模块。节点独立处理其各自的测试指令集，以生成每个节点上每个模块的测试结果，除非发生同步事件。每个节点存储每个测试模块的测试结果。由于测试模块具有有序的处理顺序，因此可以在持续的基础上将相应测试模块的每个节点的测试结果进行异步比较。

10. 发明授权

US06636981B1 Method and system for end-to-end problem determination and fault isolation for storage area networks 有权
标题翻译：存储区域网络的端到端问题确定和故障隔离的方法和系统
公开(公告)号：US06636981B1
公开(公告)日：2003-10-21
申请号：US09478306
申请日：2000-01-06
申请人： Barry Stanley Barnett , Douglas Craig Bossen
发明人： Barry Stanley Barnett , Douglas Craig Bossen
IPC分类号： G06F15177
CPC分类号： G06F11/0781 , G06F11/0727 , H04L41/022 , H04L41/0609 , H04L41/064 , H04L41/065
摘要： A method and system for problem determination and fault isolation in a storage area network (SAN) is provided. A complex configuration of multi-vendor host systems, FC switches, and storage peripherals are connected in a SAN via a communications architecture (CA). A communications architecture element (CAE) is a network-connected device that has successfully registered with a communications architecture manager (CAM) on a host computer via a network service protocol, and the CAM contains problem determination (PD) functionality for the SAN and maintains a SAN PD information table (SPDIT). The CA comprises all network-connected elements capable of communicating information stored in the SPDIT. The CAM uses a SAN topology map and the SPDIT are used to create a SAN diagnostic table (SDT). A failing component in a particular device may generate errors that cause devices along the same network connection path to generate errors. As the CAM receives error packets or error messages, the errors are stored in the SDT, and each error is analyzed by temporally and spatially comparing the error with other errors in the SDT. If a CAE is determined to be a candidate for generating the error, then the CAE is reported for replacement if possible.
摘要翻译：提供了一种用于存储区域网络（SAN）中的问题确定和故障隔离的方法和系统。多厂商主机系统，FC交换机和存储外设的复杂配置通过通信架构（CA）连接在SAN中。通信体系结构元件（CAE）是一种网络连接的设备，其已经通过网络服务协议成功地与主计算机上的通信架构管理器（CAM）注册，并且CAM包含用于SAN的问题确定（PD）功能并且维护 SAN PD信息表（SPDIT）。 CA包括能够传送存储在SPDIT中的信息的所有网络连接元件。 CAM使用SAN拓扑图，SPDIT用于创建SAN诊断表（SDT）。特定设备中的故障组件可能会产生错误，导致沿同一网络连接路径的设备产生错误。当CAM接收到错误包或错误消息时，将错误存储在SDT中，并通过对错误与SDT中的其他错误进行时间和空间的比较来分析每个错误。如果确定CAE是生成错误的候选者，则如果可能，报告CAE进行更换。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式