XX Using Checkpointing and Virtualization for Fault Injection

The program monitoring and control mechanisms of virtualization tools are becoming increasingly standardized and advanced. Together with checkpointing, these can be used for general program analysis tools. We explore this idea with an architecture we call Checkpoint-based Fault Injection (CFI), and two concrete implementations using different existing virtualization tools: DMTCP and SBUML. The implementations show interesting trade-offs in versatility and performance as well as the generality of the architecture.

[1]  Cyrille Artho,et al.  Model Checking of Concurrent Algorithms: From Java to C , 2010, DIPES/BICC.

[2]  Kang G. Shin,et al.  Fault Injection Techniques and Tools , 1997, Computer.

[3]  Lehrstuhl Systemarchitektur,et al.  Virtual Machine Benchmarking , 2007 .

[4]  Cyrille Artho,et al.  Model Checking Networked Programs in the Presence of Transmission Failures , 2007, First Joint IEEE/IFIP Symposium on Theoretical Aspects of Software Engineering (TASE '07).

[5]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[6]  Mauro Marinilli Java Deployment with JNLP and WebStart , 2001 .

[7]  Cyrille Artho,et al.  Software Model Checking of UDP-based Distributed Applications , 2014, 2014 Second International Symposium on Computing and Networking.

[8]  George Candea,et al.  LFI: A practical and general library-level fault injector , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[9]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[10]  Cyrille Artho,et al.  Software model checking for distributed systems with selector-based, non-blocking communication , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[11]  Richard Potter Kazuhiko SBUML: Multiple Snapshots of Linux Runtime State , 2009 .

[12]  Klaus Havelund,et al.  Model checking programs , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[13]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[14]  Ravishankar K. Iyer,et al.  Measuring Fault Tolerance with the FTAPE Fault Injection Tool , 1995, MMB.

[15]  Cyrille Artho,et al.  Modular Software Model Checking for Distributed Systems , 2014, IEEE Transactions on Software Engineering.

[16]  Cyrille Artho,et al.  Using Checkpointing and Virtualization for Fault Injection , 2014, 2014 Second International Symposium on Computing and Networking.

[17]  Saurabh Sinha,et al.  Criteria for testing exception-handling constructs in Java programs , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[18]  Armin Biere,et al.  Exhaustive Testing of Exception Handlers with Enforcer , 2006, FMCO.

[19]  Mitsuhisa Sato,et al.  D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[20]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Domenico Cotroneo,et al.  State-Driven Testing of Distributed Systems , 2013, OPODIS.

[22]  Cyrille Artho,et al.  Model checking distributed systems by combining caching and process checkpointing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  Henrique Madeira,et al.  Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers , 1998, IEEE Trans. Software Eng..

[24]  Corina S. Pasareanu,et al.  Symbolic PathFinder: symbolic execution of Java bytecode , 2010, ASE.

[25]  Peter M. Broadwell,et al.  FIG: A Prototype Tool for Online Verification of Recovery Mechanisms , 2002 .

[26]  Xin Li,et al.  Towards Automatic Exception Safety Verification , 2006, FM.

[27]  Todd Tannenbaum,et al.  Enforcing Murphy's Law for Advance Identification of Run-time Failures , 2012, USENIX Annual Technical Conference.