The Role of a Maintenance Processor for a General-Purpose Computer System

Research and development in fault-tolerant computing has shown that a dedicated processor, called a maintenance processor, can efficiently monitor, control, and maintain the operation of its host computer. This paper presents the general system structure and common functional capabilities of the maintenance processor, and illustrates its utilization with a survey of actual implementations available in the general-purpose computer industry. An analytical model is then presented to evaluate the impact of the maintenance processor on the host system reliability, availability, and serviceability (RAS). Examples given show negligible additional downtime and system failures due to the unavailability of a typical maintenance processor. This observation, plus others included in the paper, strongly indicate that a maintenance processor can be designed and used as the focal point of most system support activities. The approach simplifies the hardware and software structure of the host computer, and improves the total system RAS.

[1]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[2]  Nandakurnar N. Tendolkar,et al.  Automated diagnostic methodology for the IBM 3081 processor complex , 1982 .

[3]  Thomas F. Arnold,et al.  The Concept of Coverage and Its Effect on the Reliability Model of a Repairable System , 1973, IEEE Transactions on Computers.

[4]  T. May Soft Errors in VLSI: Present and Future , 1979 .

[5]  Algirdas Avizienis Fault tolerance by means of external monitoring of computer systems , 1981, AFIPS '81.

[6]  R. N. Gustafson,et al.  IBM 3081 Processor Unit: Design Considerations and Design Process , 1982, IBM J. Res. Dev..

[7]  M. Y. Hsiao,et al.  Reliability, Availability, and Serviceability of IBM Computer Systems: A Quarter Century of Progress , 1981, IBM J. Res. Dev..

[8]  A. Avizienis,et al.  Fault-tolerance: The survival attribute of digital systems , 1978, Proceedings of the IEEE.

[9]  Webb T. Comfort A Fault-Tolerant System Architecture for Navy Applications , 1983, IBM J. Res. Dev..

[10]  Frederick F. Sellers,et al.  Error detecting logic for digital computers , 1968 .

[11]  T.-S. Liu Maintenance proctors for mainframe computer: Special-purpose computers supervise the proper operation of their host computer to improve reliability and reduce down time , 1984, IEEE Spectrum.

[12]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[13]  Joost Verhofstad,et al.  Recovery Techniques for Database Systems , 1978, CSUR.

[14]  Evan Herbert Minis and mainframes: More powerful versions of existing systems emerge, as reduced hardware costs encourage parallel-architecture development , 1983, IEEE Spectrum.

[15]  John Reilly,et al.  Processor Controller for the IBM 3081 , 1982, IBM J. Res. Dev..

[16]  A. Module,et al.  Automated Diagnostic Methodology for the IBM 3081 Processor Complex , 1982 .

[17]  Marvin Zelen,et al.  Mathematical Theory of Reliability , 1965 .