RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps

Many software providers operate crash reporting services to automatically collect crashes from millions of customers and file bug reports. Precisely triaging crashes is necessary and important for software providers because the millions of crashes that may be reported every day are critical in identifying high impact bugs. However, the triaging accuracy of existing systems is limited, as they rely only on the syntactic information of the stack trace at the moment of a crash without analyzing program semantics.In this paper, we present RETracer, the first system to triage software crashes based on program semantics reconstructed from memory dumps. RETracer was designed to meet the requirements of large-scale crash reporting services. RETracer performs binary-level backward taint analysis without a recorded execution trace to understand how functions on the stack contribute to the crash. The main challenge is that the machine state at an earlier time cannot be recovered completely from a memory dump, since most instructions are information destroying.We have implemented RETracer for x86 and x86-64 native code, and compared it with the existing crash triaging tool used by Microsoft. We found that RETracer eliminates two thirds of triage errors based on a manual analysis of 140 bugs fixed in Microsoft Windows and Office. RETracer has been deployed as the main crash triaging system on Microsoft’s crash reporting service.

[1]  James R. Larus,et al.  The use of program profiling for software maintenance with applications to the year 2000 problem , 1997, ESEC '97/FSE-5.

[2]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[3]  Alessandro Orso,et al.  BugRedux: Reproducing field failures for in-house debugging , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[4]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  W. Evans A core dump , 2012 .

[6]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[7]  John R. Douceur,et al.  Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs , 2011, EuroSys '11.

[8]  Yu Cao,et al.  SymCrash: selective recording for reproducing crashes , 2014, ASE.

[9]  Yuanyuan Zhou,et al.  Triage: diagnosing production run failures at the user's site , 2007, SOSP.

[10]  Manu Sridharan,et al.  PSE: explaining program failures via postmortem static analysis , 2004, SIGSOFT '04/FSE-12.

[11]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[12]  Andreas Zeller,et al.  Reconstructing Core Dumps , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[13]  George Candea,et al.  Execution synthesis: a technique for automated software debugging , 2010, EuroSys '10.

[14]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[15]  Guy M. Lohman,et al.  Automatically Identifying Known Software Problems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[16]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[17]  Mark Weiser,et al.  Programmers use slices when debugging , 1982, CACM.

[18]  林俊彦 用Visual Studio实践敏捷测试(一) , 2010 .

[19]  Zhang,et al.  Debugging Tools来了,不再怕Windows的“大蓝脸” , 2008 .

[20]  Cristian Zamfir Execution Synthesis: A Technique for Automating the Debugging of Software , 2013 .

[21]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[22]  Alex Groce,et al.  Taming compiler fuzzers , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[23]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[24]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[25]  Galen C. Hunt,et al.  Debugging in the (very) large: ten years of implementation and experience , 2009, SOSP '09.

[26]  Manu Sridharan,et al.  Thin slicing , 2007, PLDI '07.

[27]  Joseph Robert Horgan,et al.  Fault localization using execution slices and dataflow tests , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[28]  References , 1971 .

[29]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[30]  Michael D. Ernst,et al.  ReCrash: Making Software Failures Reproducible by Preserving Object States , 2008, ECOOP.

[31]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[32]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[33]  Rongxin Wu,et al.  CrashLocator: locating crashing faults based on crash stacks , 2014, ISSTA 2014.

[34]  George Candea,et al.  Automated Debugging for Arbitrarily Long Executions , 2013, HotOS.

[35]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[36]  Vikram S. Adve,et al.  Using likely invariants for automated software fault localization , 2013, ASPLOS '13.