论文信息 - Exhaustive Exploration of the Failure-Oblivious Computing Search Space

Exhaustive Exploration of the Failure-Oblivious Computing Search Space

High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens new promising research directions.

[1] Martin Monperrus,et al. Automatic Software Repair , 2018, ACM Comput. Surv..

[2] Angelos D. Keromytis,et al. Software Self-Healing Using Collaborative Application Communities , 2006, NDSS.

[3] Angelos D. Keromytis,et al. ASSURE: automatic software self-healing using rescue points , 2009, ASPLOS.

[4] Paul Ammann,et al. Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[5] Emery D. Berger,et al. Exterminator: automatically correcting memory errors with high probability , 2007, PLDI '07.

[6] Cristian Cadar,et al. Safe software updates via multi-version execution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7] Fan Long,et al. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[8] Rajiv Gupta,et al. Execution suppression: An automated iterative technique for locating memory errors , 2010, TOPL.

[9] Michael D. Ernst,et al. Automatically patching errors in deployed software , 2009, SOSP '09.

[10] Yuanyuan Zhou,et al. SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11] Martin Monperrus,et al. Dynamic patch generation for null pointer exceptions using metaprogramming , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[12] Martin Rinard,et al. Automatic detection and repair of errors in data structures , 2003, OOPSLA 2003.

[13] Daniel M. Roy,et al. Enhancing Server Availability and Security Through Failure-Oblivious Computing , 2004, OSDI.

[14] Martin C. Rinard,et al. Bolt: on-demand infinite loop escape in unmodified binaries , 2012, OOPSLA '12.

[15] Carl E. Landwehr,et al. Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[16] Westley Weimer,et al. Changing Java's Semantics for Handling Null Pointer Exceptions , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[17] Martin Monperrus,et al. BanditRepair: Speculative Exploration of Runtime Patches , 2016, ArXiv.

[18] Fan Long,et al. Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[19] Horatiu Jula,et al. Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[20] Emery D. Berger,et al. DieHard: probabilistic memory safety for unsafe languages , 2006, PLDI '06.

[21] Alessandra Gorla,et al. Automatic workarounds for web applications , 2010, FSE '10.

[22] Stephen W. Kent. Dynamic Error Remediation : A Case Study with Null Pointer Exceptions , 2008 .

[23] E. James Whitehead,et al. Runtime repair of software faults using event-driven monitoring , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.