Faceted Bug Report Search with Topic Model

During bug reporting, The same bugs could be repeatedly reported. As a result, extra time could be spent on bug triaging and fixing. In order to reduce redundant effort, it is important to provide bug reporters with the ability to search for previously reported bugs efficiently and accurately. The existing bug tracking systems are using relatively simple ranking functions, which often produce unsatisfactory results. In this paper, we apply Ranking SVM, a Learning to Rank technique to construct a ranking model for accurate bug report search. Based on the search results, a topic model is used to cluster the bug reports into multiple facets. Each facet contains similar bug reports of the same topic. Users and testers can locate relevant bugs more efficiently through a simple query. We perform evaluations on more than 16,340 Eclipse and Mozilla bug reports. The evaluation results show that the proposed approach can achieve better search results than the existing search functions.

[1]  Christopher Olston,et al.  ScentTrails: Integrating browsing and searching on the Web , 2003, TCHI.

[2]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  David R. Karger,et al.  Magnet: supporting navigation in semistructured data environments , 2005, SIGMOD '05.

[6]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[7]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.

[10]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[11]  Yi Zhang,et al.  Personalized interactive faceted search , 2008, WWW.

[12]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[13]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[14]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.