A Chaos Engineering System for Live Analysis and Falsification of Exception-Handling in the JVM

Software systems contain resilience code to handle those failures and unexpected events happening in production. It is essential for developers to understand and assess the resilience of their systems. Chaos engineering is a technology that aims at assessing resilience and uncovering weaknesses by actively injecting perturbations in production. In this paper, we propose a novel design and implementation of a chaos engineering system in Java called ChaosMachine. It provides a unique and actionable analysis on exception-handling capabilities in production, at the level of try-catch blocks. To evaluate our approach, we have deployed ChaosMachine on top of 3 large-scale and well-known Java applications totaling 630k lines of code. Our results show that ChaosMachine reveals both strengths and weaknesses of the resilience code of a software system at the level of exception handling.

[1]  Chen Fu,et al.  Exception-Chain Analysis: Revealing Exception Handling Architecture in Java Server Applications , 2007, 29th International Conference on Software Engineering (ICSE'07).

[2]  Peter Alvaro,et al.  Automating Failure Testing Research at Internet Scale , 2016, SoCC.

[3]  Byeong-Mo Chang,et al.  A review on exception analysis , 2016, Inf. Softw. Technol..

[4]  Sudipto Ghosh,et al.  Bytecode fault injection for Java software , 2008, J. Syst. Softw..

[5]  Sebastian G. Elbaum,et al.  Amplifying Tests to Validate Exception Handling Code: An Extended Study in the Mobile Application Domain , 2014, TSEM.

[6]  Heonshik Shin,et al.  SFIDA: a software implemented fault injection tool for distributed dependable applications , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[7]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[8]  Tanakorn Leesatapornwongsa,et al.  The Case for Drill-Ready Cloud Computing , 2014, SoCC.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Martin Monperrus,et al.  Exception handling analysis and transformation using fault injection: Study of resilience against unanticipated exceptions , 2015, Inf. Softw. Technol..

[11]  Marco Vieira,et al.  Dependability Benchmarking of Web-Servers , 2004, SAFECOMP.

[12]  Peter Alvaro,et al.  Abstracting the Geniuses Away from Failure Testing , 2017, ACM Queue.

[13]  Kang G. Shin,et al.  DOCTOR: an integrated software fault injection environment for distributed real-time systems , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[14]  Simone Hanazumi,et al.  Testing Java Exceptions: An Instrumentation Technique , 2014, 2014 IEEE 38th International Computer Software and Applications Conference Workshops.

[15]  Martin Monperrus,et al.  Principles of Antifragile Software , 2014, Programming.

[16]  Daniel P. Siewiorek,et al.  Observations on the Effects of Fault Manifestation as a Function of Workload , 1992, IEEE Trans. Computers.

[17]  Ion Stoica,et al.  Failure as a Service (FaaS): A Cloud Service for Large- Scale, Online Failure Drills , 2011 .

[18]  George Candea,et al.  Efficient Testing of Recovery Code Using Fault Injection , 2011, TOCS.

[19]  Laurent Vanbever,et al.  Chaos Monkey: Increasing SDN Reliability through Systematic Network Destruction , 2015, Comput. Commun. Rev..

[20]  Maurizio Rebaudengo,et al.  Fault injection in the process descriptor of a Unix-based operating system , 2014, 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[21]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[22]  Craig Sheridan,et al.  DICE fault injection tool , 2016, QUDOS@ISSTA.

[23]  Sara Bouchenak,et al.  A Reusable Architecture for Dependability and Performance Benchmarking of Cloud Services , 2015, ICSOC Workshops.

[24]  Rod Johnson,et al.  Professional Java Development with the Spring Framework , 2005 .

[25]  Domenico Cotroneo,et al.  Assessing Dependability with Software Fault Injection , 2016, ACM Comput. Surv..

[26]  John Allspaw Fault injection in production , 2012, CACM.

[27]  Nora Jones,et al.  Building Confidence in System Behavior through Experiments , 2017 .

[28]  Arie van Deursen,et al.  Discovering faults in idiom-based exception handling , 2006, ICSE '06.

[29]  Jacob A. Abraham,et al.  FERRARI: a tool for the validation of system dependability properties , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[30]  Yu Luo,et al.  Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[31]  Neeraj Suri,et al.  DBench (Dependability Benchmarking) , 2001 .

[32]  Domenico Cotroneo,et al.  How bad can a bug get? an empirical analysis of software failures in the OpenStack cloud computing platform , 2019, ESEC/SIGSOFT FSE.

[33]  Ruud C. M. de Rooij,et al.  Chaos Engineering , 2017, IEEE Software.

[34]  Sara Bouchenak,et al.  Experience with benchmarking dependability and performance of MapReduce systems , 2016, Perform. Evaluation.

[35]  Daniel M. Roy,et al.  Enhancing Server Availability and Security Through Failure-Oblivious Computing , 2004, OSDI.

[36]  Harald C. Gall,et al.  We're doing it live: A multi-method empirical study on continuous experimentation , 2018, Inf. Softw. Technol..

[37]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[38]  Andreas Polze,et al.  Hovac: A Configurable Fault Injection Framework for Benchmarking the Dependability of C/C++ Applications , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[39]  Karama Kanoun,et al.  Dependability benchmarking for computer systems , 2008 .

[40]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..