A failure analysis of data distribution middleware in a mission-critical system for air traffic control

Middleware plays a strategic role to reduce development cost and time to market. However, it raises significant dependability challenges when integrated in complex, missioncritical systems. Testing activities, carried out during the development of middleware platforms, may be not enough to assure a proper dependability level after their integration. Middleware failures and their impact on the system as a whole have to be carefully evaluated in critical scenarios. This paper reports a practical experience with a real world, middleware-based Air Traffic Control (ATC) system, being developed in the context of an academic-industrial collaboration. Two equivalent middleware subsystems for data distribution have been compared from the dependability point of view. We identify internal dependencies and execution environment resources characterizing both the solutions. By means of an extensive failure modes emulation campaign, we show that these architectural features can significantly affect the middleware and the overall system dependability level.

[1]  D. Powell Failure mode assumptions and assumption coverage , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[2]  L. Brink,et al.  Air transport. , 1993, Pediatric clinics of North America.

[3]  Neil R. Storey,et al.  Safety-critical computer systems , 1996 .

[4]  Ram Chillarege,et al.  Generation of an error set that emulates software faults based on field data , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[5]  Elaine J. Weyuker,et al.  Testing Component-Based Software: A Cautionary Tale , 1998, IEEE Softw..

[6]  Greg Kroah-Hartman,et al.  Linux Device Drivers , 1998 .

[7]  A. G. Foord,et al.  Systems safety-a real example (European rail traffic management system, ERTMS) , 2001 .

[8]  Hoyt Lougee,et al.  SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION , 2001 .

[9]  R. Hammett Flight-critical distributed systems - design considerations , 2002, Proceedings. The 21st Digital Avionics Systems Conference.

[10]  George E. Apostolakis,et al.  Automated hazard analysis of digital control systems , 2002, Reliab. Eng. Syst. Saf..

[11]  Pietro Marmo,et al.  Hazard Analysis of Complex Distributed Railway Systems , 2003, SRDS.

[12]  E. Kesseler Air Transport, from privilege to commodity , 2003 .

[13]  R. Hammett Flight-critical distributed systems: design considerations [avionics] , 2003 .

[14]  Gerardo Pardo-Castellote,et al.  OMG Data-Distribution Service: architectural overview , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..

[15]  Juan Pardo,et al.  Robustness study of an embedded operating system for industrial applications , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[16]  Kim Fowler Mission-critical and safety-critical development , 2004 .

[17]  Sam Supakkul,et al.  Applying a Goal-Oriented Method for Hazard Analysis: A Case Study , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[18]  Rob Williams,et al.  Linux device drivers , 2006 .

[19]  Eliane Martins,et al.  Experimental Risk Assessment and Comparison Using Software Fault Injection , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).