Automatic recovery from runtime failures

We present a technique to make applications resilient to failures. This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes. We target field failures in applications built on reusable components. In particular, the technique exploits the intrinsic redundancy of those components by identifying workarounds consisting of alternative uses of the faulty components that avoid the failure. The technique is currently implemented for Java applications but makes little or no assumptions about the nature of the application, and works without interrupting the execution flow of the application and without restarting its components. We demonstrate and evaluate this technique on four mid-size applications and two popular libraries of reusable components affected by real and seeded faults. In these cases the technique is effective, maintaining the application fully functional with between 19% and 48% of the failure-causing faults, depending on the application. The experiments also show that the technique incurs an acceptable runtime overhead in all cases.

[1]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[2]  Flaviu Cristian,et al.  Exception Handling and Software Fault Tolerance , 1982, IEEE Transactions on Computers.

[3]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[4]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[5]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[6]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[7]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Lorenzo Strigini,et al.  On systematic design of protectors for employing OTS items , 2001, Proceedings 27th EUROMICRO Conference. 2001: A Net Odyssey.

[10]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[11]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[12]  Martin Rinard,et al.  Automatic detection and repair of errors in data structures , 2003, OOPSLA 2003.

[13]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[14]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[15]  Sarfraz Khurshid,et al.  Assertion-based repair of complex data structures , 2007, ASE.

[16]  Yuanyuan Zhou,et al.  Rx: Treating bugs as allergies—a safe method to survive software failures , 2007, TOCS.

[17]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[18]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[19]  Andreas Zeller,et al.  Generating Fixes from Object Behavior Anomalies , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[20]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[21]  Leonardo Mariani,et al.  In-field healing of integration problems with COTS components , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Christoph Csallner,et al.  DSDSR: a tool that uses dynamic symbolic execution for data structure repair , 2010, WODA '10.

[23]  Alessandra Gorla,et al.  Automatic workarounds for web applications , 2010, FSE '10.

[24]  Pascal Felber,et al.  Atomic Boxes: Coordinated Exception Handling with Transactional Memory , 2011, ECOOP.

[25]  Martin C. Rinard,et al.  Detecting and Escaping Infinite Loops with Jolt , 2011, ECOOP.

[26]  René Just,et al.  MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[27]  Paulo Marques,et al.  A transactional model for automatic exception handling , 2011, Comput. Lang. Syst. Struct..

[28]  Myra B. Cohen,et al.  Using feature locality: can we leverage history to avoid failures during reconfiguration? , 2011, ASAS '11.

[29]  Fan Long,et al.  Automatic input rectification , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[30]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[31]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[32]  Name M. Lastname Automatically Finding Patches Using Genetic Programming , 2013 .

[33]  Andreas Zeller,et al.  Automated Fixing of Programs with Contracts , 2014 .