AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization

During the evolution of a software system, a large number of bug reports are submitted. Locating the source code files that need to be fixed to resolve the bugs is a challenging problem. Thus, there is a need for a technique that can automatically figure out these buggy files. A number of bug localization solutions that take in a bug report and output a ranked list of files sorted based on their likelihood to be buggy have been proposed in the literature. However, the accuracy of these tools still needs to be improved. In this paper, to address this need, we propose AmaLgam+, which is a method for locating relevant buggy files that puts together fives sources of information, namely, version history, similar reports, structure, stack traces, and reporter information. We perform a large‐scale experiment on four open source projects, namely, AspectJ, Eclipse, SWT, and ZXing to localize more than 3000 bugs. We compare AmaLgam + with several state‐of‐the‐art approaches including AmaLgam, BLUiR+, BRtracer+, BugLocator, and TFIDF‐DHbPd. These approaches leverage one or several of the sources of information analyzed by AmaLgam+, but not all of them. On average, AmaLgam + achieves a 6.0% improvement over AmaLgam, which merges three sources of information, in terms of Mean Average Precision (MAP). For AspectJ and Eclipse datasets, in which there are many bug reports with stack traces and many reporters submit multiple bug reports, AmaLgam + achieves a 12.0% improvement over AmaLgam in terms of MAP. Compared with the other state‐of‐the‐art approaches, AmaLgam + achieves an improvement of 20.3%, 22.5%, 33.1%, and 73.9% over BLUiR+, BRtracer+, BugLocator, and TFIDF‐DHbPd in terms of MAP, respectively. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[3]  David Lo,et al.  Are faults localizable? , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[6]  Bogdan Dit,et al.  Integrated impact analysis for managing software changes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[8]  Zhenchang Xing,et al.  Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel , 2011, 2011 18th Working Conference on Reverse Engineering.

[9]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[10]  Alexander Feldman,et al.  A Two-Step Hierarchical Algorithm for Model-Based Diagnosis , 2006, AAAI.

[11]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[12]  E. Han,et al.  High-temperature fatigue property of Ti46Al8Nb alloy with the fully lamellar microstructure , 2012 .

[13]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[14]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[15]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[16]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[17]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[18]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[21]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[22]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[23]  Markus Stumptner,et al.  Model-Based Debugging - State of the Art And Future Challenges , 2007, V&D@FLoC.

[24]  Premkumar T. Devanbu,et al.  To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[25]  David Lo,et al.  Comprehensive evaluation of association measures for fault localization , 2010, 2010 IEEE International Conference on Software Maintenance.

[26]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[27]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[28]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[29]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[30]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[31]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[32]  Avinash C. Kak,et al.  Incorporating version histories in Information Retrieval based bug localization , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[33]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[34]  David Lo,et al.  Compositional Vector Space Models for Improved Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[35]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[36]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[37]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[38]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[39]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[40]  Mangala Gowri Nanda,et al.  Fault localization for data-centric programs , 2011, ESEC/FSE '11.

[41]  Gabriele Bavota,et al.  Using structural and semantic measures to improve software modularization , 2012, Empirical Software Engineering.

[42]  David Lo,et al.  An Empirical Study of Adoption of Software Testing in Open Source Projects , 2013, 2013 13th International Conference on Quality Software.

[43]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[44]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.