Fault injection in distributed Java applications

In a network consisting of several thousands computers, the occurrence of faults is unavoidable. Being able to test the behaviour of a distributed program in an environment where we can control the faults (such as the crash of a process) is an important feature that matters in the deployment of reliable programs. In this paper, we investigate the possibility of injecting software faults in distributed Java applications. Our scheme is by extending the FAIL-FCI software. It does not require any modification of the source code of the application under test, while retaining the possibility to write high level fault scenarios. As a proof of concept, we use our tool to test FreePastry, an existing Java implementation of a distributed hash table (DHT), against node failures

[1]  Farnam Jahanian,et al.  ORCHESTRA: A Fault Injection Environment for Distributed Systems , 1996 .

[2]  William H. Sanders,et al.  Loki: a state-driven fault injector for distributed systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[3]  Henrique Madeira,et al.  RIFLE: A General Purpose Pin-level Fault Injector , 1994, EDCC.

[4]  David E. Culler,et al.  The Mantis parallel debugger , 1996, SPDT '96.

[5]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[6]  Sudipto Ghosh,et al.  Issues in Testing Distributed Component-Based Systems , 1999 .

[7]  Miguel Castro,et al.  Security for Structured Peer-to-peer Overlay Networks , 2004 .

[8]  Sébastien Tixeuil,et al.  A language-driven tool for fault injection in distributed systems , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[9]  Ravishankar K. Iyer,et al.  NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[10]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[11]  Miguel Castro,et al.  One ring to rule them all: service discovery and binding in structured peer-to-peer overlay networks , 2002, EW 10.

[12]  Kang G. Shin,et al.  DOCTOR: an integrated software fault injection environment for distributed real-time systems , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[13]  Farnam Jahanian,et al.  Testing of fault-tolerant and real-time distributed systems via protocol fault injection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[14]  Richard P. Martin,et al.  Mendosis: A SAN-based Fault Injection Test-bed for the Construction of Highly Available Network Services , 2001 .

[15]  Jie Xu,et al.  Assessing the dependability of OGSA middleware by fault injection , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..