Fault-tolerance in the advanced automation system

The Advanced Automation System is a distributed real-time system under development by IBM's Systems Integration Division for the US Federal Aviation Administration. The system is intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade. High availability of air traffic control services is an essential requirement of the system. This paper discusses the general approach to fault-tolerance adopted in AAS, by reviewing some of the questions which were asked during the system design, various alternative solutions considered, and the reasons for the design choices made.

[1]  David Lorge Parnas,et al.  Review of David L. Parnas' "Designing Software for Ease of Extension and Contraction" , 2004 .

[2]  Flaviu Cristian,et al.  A Rigorous Approach to Fault-Tolerant Programming , 1985, IEEE Transactions on Software Engineering.

[3]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[4]  Aviziens Fault-Tolerant Systems , 1976, IEEE Transactions on Computers.

[5]  S TanenbaumAndrew,et al.  Distributed operating systems , 1985 .

[6]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.

[7]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[8]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[9]  Algirdas Avizienis,et al.  On the Achievement of a Highly Dependable and Fault-Tolerant Air Traffic Control System , 1987, Computer.

[10]  Philip A. Bernstein,et al.  Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing , 1988, Computer.

[11]  Jim Gray,et al.  Fault Tolerance in Tandem Computer Systems , 1987 .

[12]  D. L. Palumbo,et al.  Measurement of SIFT operating system overhead , 1985 .

[13]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[14]  Andrew S. Tanenbaum,et al.  Distributed operating systems , 2009, CSUR.

[15]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.