Fault Injection in Virtualized Systems—Challenges and Applications

We analyze the interaction between system virtualization and fault injection: (i) use of virtualization to facilitate fault injection into non-virtualized systems, and (ii) use of fault injection to evaluate the dependability of virtualized systems. We explore the benefits of using virtualization for fault injection and discuss the challenges of implementing fault injection in virtualized systems along with resolutions to those challenges. For experimental evaluation, we use a test platform that consists of the Gigan fault injector, that we have developed, with the Xen virtual machine monitor. We evaluate the degree to which fault injection results obtained from running the target system in a virtual machine are comparable to running the target system on bare hardware. We compare results when injection is done from within the target system versus from the hosting hypervisor. We evaluate the performance benefits of leveraging system virtualization for fault injection. Finally, we demonstrate the capabilities of our injector and highlight the benefits of leveraging system virtualization for fault injection by describing deployments of Gigan to evaluate both non-virtualized and virtualized systems.

[1]  Diamantino Costa,et al.  Fault injection spot-checks computer system dependability , 1999 .

[2]  M. Schunter,et al.  Architecting Dependable Systems Using Virtualization , 2007 .

[3]  Ravishankar K. Iyer,et al.  NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[4]  Henrique Madeira,et al.  Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers , 1998, IEEE Trans. Software Eng..

[5]  Performance Evaluation of Intel EPT Hardware Assist , 2006 .

[6]  Jeff Dike,et al.  A user-mode port of the Linux kernel , 2000, Annual Linux Showcase & Conference.

[7]  Y. Tamir,et al.  Challenges and Opportunities with Fault Injection in Virtualized Systems , 2008 .

[8]  Ravishankar K. Iyer,et al.  Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  Ravishankar K. Iyer,et al.  An OS-level Framework for Providing Application-Aware Reliability , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[10]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[11]  Volkmar Sieh,et al.  UMLinux - A Versatile SWIFI Tool , 2002, EDCC.

[12]  Vivek Goyal Kdump, A Kexec-based Kernel Crash Dumping Mechanism , 2005 .

[13]  John R. Douceur,et al.  Replicated Virtual Machines , 2005 .

[14]  Yaozu Dong Extending Xen* with IntelŴVirtualization Technology , 2006 .

[15]  Michael Le,et al.  Resilient Virtual Clusters , 2011, 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing.

[16]  Yaozu Dong,et al.  Extending Xen* with Intel Virtualization Technology , 2006 .

[17]  Muli Ben-Yehuda,et al.  The Turtles Project: Design and Implementation of Nested Virtualization , 2010, OSDI.

[18]  Michael Le,et al.  ReHype: enabling VM survival across hypervisor failures , 2011, VEE '11.

[19]  Volkmar Sieh,et al.  Framework for testing the fault-tolerance of systems including OS and network aspects , 2001, Proceedings Sixth IEEE International Symposium on High Assurance Systems Engineering. Special Topic: Impact of Networking.

[20]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.

[21]  Ravishankar K. Iyer,et al.  CloudVal: A framework for validation of virtualization environment in cloud infrastructure , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[22]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[23]  Tal Garfinkel,et al.  Virtual machine monitors: current technology and future trends , 2005, Computer.

[24]  J. Arlat,et al.  Assessment of COTS microkernels by fault injection , 1999, Dependable Computing for Critical Applications 7.

[25]  Johan Karlsson,et al.  Comparison of Physical and Software-Implemented Fault Injection Techniques , 2003, IEEE Trans. Computers.

[26]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[27]  Yoshio Turner,et al.  Maintaining Network QoS Across NIC Device Driver Failures Using Virtualization , 2009, 2009 Eighth IEEE International Symposium on Network Computing and Applications.

[28]  Renato J. O. Figueiredo,et al.  A Flexible Approach to Improving System Reliability with Virtual Lockstep , 2012, IEEE Transactions on Dependable and Secure Computing.

[29]  Michael Le,et al.  Using Virtualization to Validate Fault-Tolerant Distributed Systems , 2010 .

[30]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[31]  Michael Le,et al.  Applying Microreboot to System Software , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[32]  Andrew Warfield,et al.  Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .

[33]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[34]  Mario Dal Cin,et al.  Hardware fault injection with UMLinux , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[35]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[36]  C. Fetzer,et al.  Fast Fault Injection with Virtual Machines , 2007 .

[37]  Yoshihiro Oyama,et al.  A Hypervisor for Injecting Scenario-Based Attack Effects , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[38]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[39]  Yuval Tamir,et al.  FAULT-TOLERANT CLUSTER MANAGEMENT FOR RELIABLE HIGH-PERFORMANCE COMPUTING , 2001 .