Automatically detecting missing cleanup for ungraceful exits

Software encounters ungraceful exits due to either bugs in the interrupt/signal handler code or the intention of developers to debug the software. Users may suffer from ”weird” problems caused by leftovers of the ungraceful exits. A common practice to fix these problems is rebooting, which wipes away the stale state of the software. This solution, however, is heavyweight and often leads to poor user experience because it requires restarting other normal processes. In this paper, we design SafeExit, a tool that can automatically detect and pinpoint the root causes of the problems caused by ungraceful exits, which can help users fix the problems using lightweight solutions. Specifically, SafeExit checks the program exit behaviors in the case of an interrupted execution against its expected exit behaviors to detect the missing cleanup behaviors required for avoiding the ungraceful exit. The expected behaviors are obtained by monitoring the program exit under a normal execution. We apply SafeExit to 38 programs across 10 domains. SafeExit finds 133 types of cleanup behaviors from 36 programs and detects 2861 missing behaviors from 292 interrupted executions. To predict missing behaviors for unseen input scenarios, SafeExit trains prediction models using a set of sampled input scenarios. The results show that SafeExit is accurate with an average F-measure of 92.5%.

[1]  Xiaohui Gu,et al.  Understanding Real World Data Corruptions in Cloud Systems , 2015, 2015 IEEE International Conference on Cloud Engineering.

[2]  Chris Malburg A Graceful Exit , 1999 .

[3]  Zhendong Su,et al.  How test suites impact fault localisation starting from the size , 2018, IET Softw..

[4]  Martin P. Robillard,et al.  Enforcing Exception Handling Policies with a Domain-Specific Language , 2016, IEEE Transactions on Software Engineering.

[5]  Iulian Neamtiu,et al.  Finding resume and restart errors in Android applications , 2016, OOPSLA.

[6]  Alessandro F. Garcia,et al.  Categorizing Faults in Exception Handling: A Study of Open Source Projects , 2014, 2014 Brazilian Symposium on Software Engineering.

[7]  Nélio Cacho,et al.  An Exploratory Study of Exception Handling Behavior in Evolving Android and Java Applications , 2016, SBES '16.

[8]  Nélio Cacho,et al.  Do android developers neglect error handling? a maintenance-Centric study on the relationship between android abstractions and uncaught exceptions , 2018, J. Syst. Softw..

[9]  Carlos José Pereira de Lucena,et al.  Contrasting exception handling code across languages: An experience report involving 50 open source projects , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[10]  Tingting Yu,et al.  Automatic detection and validation of race conditions in interrupt-driven embedded software , 2017, ISSTA.

[11]  S. Sudarshan,et al.  Detection and Recovery Techniques for Database Corruption , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Dong Wang,et al.  An empirical study on crash recovery bugs in large-scale distributed systems , 2018, ESEC/SIGSOFT FSE.

[13]  Andrea C. Arpaci-Dusseau,et al.  Redundancy Does Not Imply Fault Tolerance , 2017, ACM Trans. Storage.

[14]  Felipe Ebert,et al.  A Reflection on “An Exploratory Study on Exception Handling Bugs in Java Programs” , 2015, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[15]  George C. Necula,et al.  Finding and preventing run-time error handling mistakes , 2004, OOPSLA.

[16]  Andrea C. Arpaci-Dusseau,et al.  FATE and DESTINI: A Framework for Cloud Recovery Testing , 2011, NSDI.

[17]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[18]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[19]  Zhendong Su,et al.  Automatic runtime recovery via error handler synthesis , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Adam Chlipala,et al.  Using Crash Hoare logic for certifying the FSCQ file system , 2015, USENIX Annual Technical Conference.

[21]  Jeffrey F. Naughton,et al.  Impact of disk corruption on open-source DBMS , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[22]  Angela Demke Brown,et al.  Recon: Verifying file system consistency at runtime , 2012, TOS.

[23]  George C. Necula,et al.  Exceptional situations and program reliability , 2008, TOPL.

[24]  Cecília M. F. Rubira,et al.  Exceptions and aspects: the devil is in the details , 2006, SIGSOFT '06/FSE-14.

[25]  Yuhua Qi,et al.  Slice-based statistical fault localization , 2014, J. Syst. Softw..

[26]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[27]  Chanchal Kumar Roy,et al.  On the Use of Context in Recommending Exception Handling Code Examples , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Xiaodong Liu,et al.  SMARTLOG: Place error log statement by deep understanding of log intention , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[30]  Jiawei Han,et al.  Classification of software behaviors for failure detection: a discriminative pattern mining approach , 2009, KDD.

[31]  Andrea C. Arpaci-Dusseau,et al.  SQCK: A Declarative File System Checker , 2008, OSDI.

[32]  Joseph M. Hellerstein,et al.  Lineage-driven Fault Injection , 2015, SIGMOD Conference.