A human study of fault localization accuracy

Localizing and repairing defects are critical software engineering activities. Not all programs and not all bugs are equally easy to debug, however. We present formal models, backed by a human study involving 65 participants (from both academia and industry) and 1830 total judgments, relating various software- and defect-related features to human accuracy at locating errors. Our study involves example code from Java textbooks, helping us to control for both readability and complexity. We find that certain types of defects are much harder for humans to locate accurately. For example, humans are over five times more accurate at locating “extra statements” than “missing statements” based on experimental observation. We also find that, independent of the type of defect involved, certain code contexts are harder to debug than others. For example, humans are over three times more accurate at finding defects in code that provides an array abstraction than in code that provides a tree abstraction. We identify and analyze code features that are predictive of human fault localization accuracy. Finally, we present a formal model of debugging accuracy based on those source code features that have a statistically significant correlation with human performance.

[1]  Raymond P. L. Buse,et al.  A metric for software readability , 2008, ISSTA '08.

[2]  David Hovemeyer,et al.  Using Static Analysis to Find Bugs , 2008, IEEE Software.

[3]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[4]  Spencer Rugaber,et al.  The use of domain knowledge in program understanding , 2000, Ann. Softw. Eng..

[5]  Walter Savitch,et al.  Absolute Java (3rd Edition) , 2007 .

[6]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[7]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[8]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[9]  Paul Ammann,et al.  An experimental evaluation of simple methods for seeding program errors , 1985, ICSE '85.

[10]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[11]  Peter Drake,et al.  Data structures and algorithms in Java , 2005 .

[12]  Walter Savitch,et al.  Data Structures and Abstractions with Java (2nd Edition) , 2005 .

[13]  Welf Löwe,et al.  Quantitative Evaluation of Software Quality Metrics in Open-Source Projects , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[14]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[15]  Steven J. Zeil Perturbation Techniques for Detecting Domain Errors , 1989, IEEE Trans. Software Eng..

[16]  Lawrence L. Giventer Statistical Analysis for Public Administration , 1995 .

[17]  Andreas Zeller,et al.  Yesterday, my program worked. Today, it does not. Why? , 1999, ESEC/FSE-7.

[18]  Darrell R. Raymond,et al.  Reading source code , 1991, CASCON.

[19]  G. Rothermel,et al.  An empirical study of fault localization for end-user programmers , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[20]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[21]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[22]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[23]  Barry W. Boehm,et al.  Quantitative evaluation of software quality , 1976, ICSE '76.

[24]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[25]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[26]  George C. Necula,et al.  Finding and preventing run-time error handling mistakes , 2004, OOPSLA.

[27]  A. Jefferson Offutt,et al.  An Approach to Fault Modeling and Fault Seeding Using the Program Dependence Graph , 1997, J. Syst. Softw..

[28]  Mayur Naik,et al.  From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[29]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[30]  Eugene Creswick,et al.  Strategies and behaviors of end-user programmers with interactive fault localization , 2003, IEEE Symposium on Human Centric Computing Languages and Environments, 2003. Proceedings. 2003.

[31]  Walter Savitch,et al.  Data structures and abstractions with Java , 2002 .