Distributed System Design Checklist

This report describes a design checklist targeted to fault-tolerant distributed electronic systems. Many of the questions and discussions in this checklist may be generally applicable to the development of any safety-critical system. However, the primary focus of this report covers the issues relating to distributed electronic system design. The questions that comprise this design checklist were created with the intent to stimulate system designers' thought processes in a way that hopefully helps them to establish a broader perspective from which they can assess the system's dependability and fault-tolerance mechanisms. While best effort was expended to make this checklist as comprehensive as possible, it is not (and cannot be) complete. Instead, we expect that this list of questions and the associated rationale for the questions will continue to evolve as lessons are learned and further knowledge is established. In this regard, it is our intent to post the questions of this checklist on a suitable public web-forum, such as the NASA DASHLink AFCS repository. From there, we hope that it can be updated, extended, and maintained after our initial research has been completed.

[1]  Stephen Osder,et al.  Generic Faults and Architecture Design Considerations in Flight-Critical Systems , 1983 .

[2]  Brendan Hall,et al.  Model-Driven Test Generation of Distributed Systems , 2012 .

[3]  Daniel L. Dvorak,et al.  NASA Study on Flight Software Complexity , 2009 .

[4]  Håkan Sivencrona,et al.  Byzantine Fault Tolerance, from Theory to Reality , 2003, SAFECOMP.

[5]  C.M. Ananda General aviation aircraft avionics: Integration & system tests , 2009, IEEE Aerospace and Electronic Systems Magazine.

[6]  Kevin Driscoll,et al.  The Airplane Information Management System: an integrated real-time flight-deck control system , 1992, [1992] Proceedings Real-Time Systems Symposium.

[7]  Philip Koopman,et al.  Data Network Evaluation Criteria Handbook , 2009 .

[8]  B. Hall,et al.  Maximizing fault tolerance in a low-s WaP data network , 2012, 2012 IEEE/AIAA 31st Digital Avionics Systems Conference (DASC).

[9]  Rushby John,et al.  Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance , 1999 .

[10]  Paul A. Judas,et al.  A historical compilation of software metrics with applicability to NASA’s Orion spacecraft flight software sizing , 2011, Innovations in Systems and Software Engineering.

[11]  P.J. Prisaznuk,et al.  ARINC 653 role in Integrated Modular Avionics (IMA) , 2008, 2008 IEEE/AIAA 27th Digital Avionics Systems Conference.

[12]  Chris J. Walter,et al.  The MAFT Architecture for Distributed Fault Tolerance , 1988, IEEE Trans. Computers.

[13]  I. B. Myers Manual: A Guide to the Development and Use of the Myers-Briggs Type Indicator , 1985 .

[14]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[15]  Wilfredo Torres-Pomales,et al.  Robus-2: A Fault-Tolerant Broadcast Communication System , 2013 .

[16]  Philip Koopman,et al.  Quantifying the reliability of proven SPIDER group membership service guarantees , 2004, International Conference on Dependable Systems and Networks, 2004.

[17]  Philip Koopman,et al.  Selection of Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity , 2015 .

[18]  G. E. Reeves,et al.  What Really Happened on Mars , 1998 .