Exhaustive Exploration of the Failure-Oblivious Computing Search Space

High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens new promising research directions.

[1]  Martin Monperrus,et al.  Automatic Software Repair , 2018, ACM Comput. Surv..

[2]  Angelos D. Keromytis,et al.  Software Self-Healing Using Collaborative Application Communities , 2006, NDSS.

[3]  Angelos D. Keromytis,et al.  ASSURE: automatic software self-healing using rescue points , 2009, ASPLOS.

[4]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[5]  Emery D. Berger,et al.  Exterminator: automatically correcting memory errors with high probability , 2007, PLDI '07.

[6]  Cristian Cadar,et al.  Safe software updates via multi-version execution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7]  Fan Long,et al.  An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[8]  Rajiv Gupta,et al.  Execution suppression: An automated iterative technique for locating memory errors , 2010, TOPL.

[9]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[10]  Yuanyuan Zhou,et al.  SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Martin Monperrus,et al.  Dynamic patch generation for null pointer exceptions using metaprogramming , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[12]  Martin Rinard,et al.  Automatic detection and repair of errors in data structures , 2003, OOPSLA 2003.

[13]  Daniel M. Roy,et al.  Enhancing Server Availability and Security Through Failure-Oblivious Computing , 2004, OSDI.

[14]  Martin C. Rinard,et al.  Bolt: on-demand infinite loop escape in unmodified binaries , 2012, OOPSLA '12.

[15]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[16]  Westley Weimer,et al.  Changing Java's Semantics for Handling Null Pointer Exceptions , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[17]  Martin Monperrus,et al.  BanditRepair: Speculative Exploration of Runtime Patches , 2016, ArXiv.

[18]  Fan Long,et al.  Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[19]  Horatiu Jula,et al.  Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[20]  Emery D. Berger,et al.  DieHard: probabilistic memory safety for unsafe languages , 2006, PLDI '06.

[21]  Alessandra Gorla,et al.  Automatic workarounds for web applications , 2010, FSE '10.

[22]  Stephen W. Kent Dynamic Error Remediation : A Case Study with Null Pointer Exceptions , 2008 .

[23]  E. James Whitehead,et al.  Runtime repair of software faults using event-driven monitoring , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.