Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence

Abstract Given a stack trace reported at the time of software crash, crash localization aims to pinpoint the root cause of the crash. Crash localization is known as a time-consuming and labor-intensive task. Without tool support, developers have to spend tedious manual effort examining a large amount of source code based on their experience. In this paper, we propose an automatic approach, namely CraTer, which predicts whether a crashing fault resides in stack traces or not (referred to as predicting crashing fault residence). We extract 89 features from stack traces and source code to train a predictive model based on known crashes. We then use the model to predict the residence of newly-submitted crashes. CraTer can reduce the search space for crashing faults and help prioritize crash localization efforts. Experimental results on crashes of seven real-world projects demonstrate that CraTer can achieve an average accuracy of over 92%.

[1]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[2]  Hareton K. N. Leung,et al.  Understanding the API usage in Java , 2016, Inf. Softw. Technol..

[3]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[4]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[5]  Sarfraz Khurshid,et al.  Injecting mechanical faults to localize developer faults for evolving software , 2013, OOPSLA.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Laurie A. Williams,et al.  Approximating Attack Surfaces with Stack Traces , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  Matias Martinez,et al.  B-Refactoring: Automatic test code refactoring to improve dynamic analysis , 2016, Information and Software Technology.

[9]  Wei Li,et al.  Fault Localization for Null Pointer Exception Based on Stack Trace and Program Slicing , 2012, 2012 12th International Conference on Quality Software.

[10]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[11]  Renaud Pawlak,et al.  SPOON: A library for implementing analyses and transformations of Java source code , 2016, Softw. Pract. Exp..

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Liang Gong,et al.  Locating Crashing Faults based on Crash Stack Traces , 2014, ArXiv.

[14]  Annibale Panichella,et al.  Evolutionary testing for crash reproduction , 2016 .

[15]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[16]  Shi Ying,et al.  EH-Recommender: Recommending Exception Handling Strategies Based on Program Context , 2018, 2018 23rd International Conference on Engineering of Complex Computer Systems (ICECCS).

[17]  Andreas Zeller,et al.  Reconstructing Core Dumps , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[18]  Martin Monperrus,et al.  Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs , 2018, IEEE Transactions on Software Engineering.

[19]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[20]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[21]  Arie van Deursen,et al.  A guided genetic algorithm for automated crash reproduction , 2017, ICSE 2017.

[22]  Dimitris Mitropoulos,et al.  Charting the API minefield using software telemetry data , 2014, Empirical Software Engineering.

[23]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[24]  Rongxin Wu,et al.  CrashLocator: locating crashing faults based on crash stacks , 2014, ISSTA 2014.

[25]  Martin Monperrus,et al.  Crash reproduction via test case mutation: let existing test cases help , 2015, ESEC/SIGSOFT FSE.

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[28]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[29]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[30]  Yongfeng Gu,et al.  Automatic Reproducible Crash Detection , 2016, 2016 International Conference on Software Analysis, Testing and Evolution (SATE).

[31]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[32]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[33]  David Lo,et al.  EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites , 2014, ICSME.

[34]  David Lo,et al.  Will Fault Localization Work for These Failures? An Automated Approach to Predict Effectiveness of Fault Localization Tools , 2013, 2013 IEEE International Conference on Software Maintenance.

[35]  Loet Leydesdorff,et al.  The relation between Pearson's correlation coefficient r and Salton's cosine measure , 2009, ArXiv.

[36]  David Lo,et al.  Cross-language bug localization , 2014, ICPC 2014.

[37]  Xin Zhang,et al.  How do Multiple Pull Requests Change the Same Code: A Study of Competing Pull Requests in GitHub , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[38]  Alex Groce,et al.  On The Limits of Mutation Reduction Strategies , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[39]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[40]  Nélio Cacho,et al.  Do android developers neglect error handling? a maintenance-Centric study on the relationship between android abstractions and uncaught exceptions , 2018, J. Syst. Softw..

[41]  Mickaël Delahaye,et al.  A Comparison of Mutation Analysis Tools for Java , 2013, 2013 13th International Conference on Quality Software.

[42]  Akbar Siami Namin,et al.  The use of mutation in testing experiments and its sensitivity to external threats , 2011, ISSTA '11.

[43]  Rongxin Wu,et al.  Casper: an efficient approach to call trace collection , 2016, POPL.

[44]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[45]  Lu Zhang,et al.  Predictive Mutation Testing , 2016, IEEE Transactions on Software Engineering.

[46]  David Lo,et al.  Fusion fault localizers , 2014, ASE.

[47]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[48]  Lars Grunske,et al.  A learning-to-rank based fault localization approach using likely invariants , 2016, ISSTA.

[49]  He Jiang,et al.  Developer recommendation on bug commenting: a ranking approach for the developer crowd , 2017, Science China Information Sciences.

[50]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[51]  Ning Chen,et al.  STAR: Stack Trace Based Automatic Crash Reproduction via Symbolic Execution , 2015, IEEE Transactions on Software Engineering.

[52]  Li Li,et al.  Watch out for this commit! A study of influential software changes , 2016, J. Softw. Evol. Process..

[53]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[54]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..