Bug Localization via Supervised Topic Modeling

Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

[1]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[2]  Maosong Sun,et al.  Tag-LDA for Scalable Real-time Tag Recommendation , 2009 .

[3]  Martin Monperrus,et al.  Learning to Combine Multiple Ranking Metrics for Fault Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[4]  Yan Xiao,et al.  Improving Bug Localization with an Enhanced Convolutional Neural Network , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Rongxin Wu,et al.  CrashLocator: locating crashing faults based on crash stacks , 2014, ISSTA 2014.

[7]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[10]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[11]  David Lo,et al.  Network-Clustered Multi-Modal Bug Localization , 2018, IEEE Transactions on Software Engineering.

[12]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[13]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[14]  Eunseok Lee,et al.  Bug Localization Based on Code Change Histories and Bug Reports , 2015, 2015 Asia-Pacific Software Engineering Conference (APSEC).

[15]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[16]  Yan Xiao,et al.  Machine translation-based bug localization technique for bridging lexical gap , 2018, Inf. Softw. Technol..

[17]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[18]  David Lo,et al.  Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools , 2016, Empirical Software Engineering.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[22]  Ming Li,et al.  Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code , 2017, IJCAI.

[23]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[24]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[25]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[26]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[27]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[28]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[29]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[30]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Shaohua Wang,et al.  Improving bug localization using correlations in crash reports , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[32]  Zhi-Hua Zhou,et al.  Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code , 2016, IJCAI.

[33]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[34]  Anh Tuan Nguyen,et al.  Bug Localization with Combination of Deep Learning and Information Retrieval , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).