Cesium: Testing Hard Real-time and Dependability Properties of Distributed Protocols

Cesium is an object-oriented environment for testing that implementations of real-time, fault-tolerant protocols satisfy the safety and timeliness properties prescribed by their specifications. Protocol implementations are tested under configurable workloads and failure scenarios. A centralized simulator executes all tasks in a single address space while providing the appearance of truly distributed execution. Experiments can be exactly reproduced any number of times; Cesium provides an unprecedented degree of monitoring and control over them. It is not necessary to instrument (or even to have access to) the source code of the protocols under test. The observed behaviors correspond exactly to executions in the real system being simulated, as Cesium does not change the time of occurrence of any event. Besides from providing a testing and performance evaluation environment superior to real distributed systems, Cesium can test properties of existing protocols that can not be tested in any distributed environment.

[1]  Farnam Jahanian,et al.  Testing of fault-tolerant and real-time distributed systems via protocol fault injection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[2]  Jason Gait,et al.  A probe effect in concurrent programs , 1986, Softw. Pract. Exp..

[3]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[4]  Flaviu Cristian,et al.  A Rigorous Approach to Fault-Tolerant Programming , 1985, IEEE Transactions on Software Engineering.

[5]  Danny Dolev,et al.  On the Possibility and Impossibility of Achieving Clock Synchronization , 1986, J. Comput. Syst. Sci..

[6]  Flaviu Cristian,et al.  Centralized failure injection for distributed, fault-tolerant protocol testing , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[7]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  Flaviu Cristian,et al.  Fault-tolerant external clock synchronization , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[9]  R. C. Covington,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[10]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[11]  Helen Davis,et al.  Tango: A Multiprocessor Simulation and Tracing System , 1990 .

[12]  Ravishankar K. Iyer,et al.  Simulation of software behavior under hardware faults , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[13]  Stanislaw Budkowski Estelle Development Toolset (EDT) , 1992, Comput. Networks ISDN Syst..

[14]  Kang G. Shin,et al.  DOCTOR: an integrated software fault injection environment for distributed real-time systems , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[15]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.