A topic-based approach for narrowing the search space of buggy files from a bug report

Locating buggy code is a time-consuming task in software development. Given a new bug report, developers must search through a large number of files in a project to locate buggy code. We propose BugScout, an automated approach to help developers reduce such efforts by narrowing the search space of buggy files when they are assigned to address a bug report. BugScout assumes that the textual contents of a bug report and that of its corresponding source code share some technical aspects of the system which can be used for locating buggy source files given a new bug report. We develop a specialized topic model that represents those technical aspects as topics in the textual contents of bug reports and source files, and correlates bug reports and corresponding buggy files via their shared topics. Our evaluation shows that BugScout can recommend buggy files correctly up to 45% of the cases with a recommended ranked list of 10 files.

[1]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[2]  Romain Robbes,et al.  Linking e-mails and source code artifacts , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[4]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[5]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[7]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[8]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[9]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[10]  A. Raftery,et al.  How Many Iterations in the Gibbs Sampler , 1991 .

[11]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[12]  Harald C. Gall,et al.  EQ-Mine: Predicting Short-Term Defects for Software Evolution , 2007, FASE.

[13]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[14]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[15]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[16]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[17]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[18]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[22]  Andreas Zeller,et al.  HATARI: raising risk awareness , 2005, ESEC/FSE-13.

[23]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.