Failure Sketches: A Better Way to Debug

One of the main reasons debugging is hard and time consuming is that existing debugging tools do not provide an explanation for the root causes of failures. Additionally, existing techniques either rely on expensive runtime recording or assume existence of a given program input that reliably reproduces the failure, which makes them hard to apply in production scenarios. Consequently, developers spend precious time chasing elusive bugs, resulting in productivity loss. We propose a new debugging technique, called failure sketching, that provides the developer with a high-level explanation for the root cause of a failure. A failure sketch achieves this goal because: 1) it only contains program statements that cause a failure; 2) it shows which program properties differ between failing and successful executions. We argue that failure sketches can be built by combining in-house static analysis and crowdsourced dynamic analysis. For building a failure sketch, we do not assume that developers can reproduce the failure. We show preliminary evidence that failure sketches can significantly improve programmer productivity.

[1]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[2]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[3]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[4]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[5]  Andrew M. Kuhn,et al.  Code Complete , 2005, Technometrics.

[6]  Vikram S. Adve,et al.  Using likely invariants for automated software fault localization , 2013, ASPLOS '13.

[7]  F. Paul Wilson,et al.  Root Cause Analysis : A Tool for Total Quality Management , 1993 .

[8]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[9]  Yuanyuan Zhou,et al.  Triage: diagnosing production run failures at the user's site , 2007, SOSP.

[10]  Vikram S. Adve,et al.  Macroscopic Data Structure Analysis and Optimization , 2005 .

[11]  Shan Lu,et al.  Leveraging the short-term memory of hardware to diagnose production-run software failures , 2014, ASPLOS.

[12]  Yan Wang,et al.  DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing , 2014, CGO '14.

[13]  George Candea,et al.  Data races vs. data race bugs: telling the difference with portend , 2012, ASPLOS XVII.

[14]  Shan Lu,et al.  ConSeq: detecting concurrency bugs through sequential errors , 2011, ASPLOS XVI.

[15]  Shan Lu,et al.  Instrumentation and sampling strategies for cooperative concurrency bug isolation , 2010, OOPSLA.

[16]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[17]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[18]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[19]  Sebastian Burckhardt,et al.  Effective Data-Race Detection for the Kernel , 2010, OSDI.

[20]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, International Symposium on Computer Architecture.

[21]  Ali-Reza Adl-Tabatabai,et al.  CoreRacer: A practical memory race recorder for multicore x86 TSO processors , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Satish Narayanasamy,et al.  LiteRace: effective sampling for lightweight data-race detection , 2009, PLDI '09.