G4: a fault-tolerant CMOS mainframe

G4 is IBM's fourth generation CMOS microprocessor-based S/390 mainframe but the first to achieve fault tolerant equivalence-or superiority-with its predecessor ECL mainframes. CMOS technology provides much greater density and integration, assuring superior fault avoidance characteristics. The reduced power of CMOS makes bulk power redundancy and battery backup practical. However, the high density and circuit properties of CMOS pose new challenges for detection, recovery, and online repair. G4 implements an innovative design for a high performance, fault tolerant, single-chip microprocessor. Microprocessor sparing is used as a concurrent repair mechanism. Increased memory density requires new (76,64) S4EC/DED Error Correction Codes so that all single chip failures are correctable. As many as four I/O interfaces are packaged on an individual card, requiring both configuration management and automated maintenance procedures to assure all devices maintain connectivity during online repair.

[1]  John S. Liptay,et al.  A high-frequency custom CMOS S/390 microprocessor , 1997, IBM J. Res. Dev..

[2]  Thomas A. Gregg,et al.  S/390 CMOS server I/O: The continuing evolution , 1997, IBM J. Res. Dev..

[3]  Thomas Pflueger,et al.  S/390 Parallel Enterprise Server Generation 3: A balanced system and cache structure , 1997, IBM J. Res. Dev..

[4]  Robert W. Horst,et al.  The risk of data corruption in microprocessor-based systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[5]  Pak-kin Mak,et al.  Shared-cache clusters in a system with a fully shared memory , 1997, IBM J. Res. Dev..

[6]  Eiji Fujiwara,et al.  Single b-bit byte error correcting and double bit error detecting codes for high-speed memory systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[7]  Ram Chillarege,et al.  IBM's ES/9000 Model 982's fault-tolerant design for consolidation , 1994, IEEE Micro.

[8]  Richard D. Regan,et al.  Availability in Parallel Systems: Automatic Process Restart , 1997, IBM Syst. J..

[9]  Ram Chillarege,et al.  Design for fault-tolerance in system ES model 900 , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[10]  Jeffrey M. Nick,et al.  S/390 Cluster Technology: Parallel Sysplex , 1997, IBM Syst. J..