Search-based duplicate defect detection: An industrial experience

Duplicate defects put extra overheads on software organizations, as the cost and effort of managing duplicate defects are mainly redundant. Due to the use of natural language and various ways to describe a defect, it is usually hard to investigate duplicate defects automatically. This problem is more severe in large software organizations with huge defect repositories and massive number of defect reporters. Ideally, an efficient tool should prevent duplicate reports from reaching developers by automatically detecting and/or filtering duplicates. It also should be able to offer defect triagers a list of top-N similar bug reports and allow them to compare the similarity of incoming bug reports with the suggested duplicates. This demand has motivated us to design and develop a search-based duplicate bug detection framework at BlackBerry. The approach follows a generalized process model to evaluate and tune the performance of the system in a systematic way. We have applied the framework on software projects at BlackBerry, in addition to the Mozilla defect repository. The experimental results exhibit the performance of the developed framework and highlight the high impact of parameter tuning on its performance.

[1]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[2]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[3]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[4]  Zbigniew Michalewicz,et al.  Parameter Setting in Evolutionary Algorithms , 2007, Studies in Computational Intelligence.

[5]  Javier Parapar,et al.  Finding the Best Parameter Setting Particle Swarm Optimisation , 2012 .

[6]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[7]  Eric Monfroy,et al.  Autonomous Search , 2012, Springer Berlin Heidelberg.

[8]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Daniel Lucrédio,et al.  An Initial Study on the Bug Report Duplication Problem , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[10]  Zbigniew Michalewicz,et al.  Parameter Control in Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[11]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[12]  Sriram K. Rajamani,et al.  DebugAdvisor: a recommender system for debugging , 2009, ESEC/FSE '09.

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[15]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[16]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[17]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[18]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[19]  Mark Harman,et al.  Search Based Software Engineering: Techniques, Taxonomy, Tutorial , 2010, LASER Summer School.

[20]  A. E. Eiben,et al.  Evolutionary Algorithm Parameters and Methods to Tune Them , 2012, Autonomous Search.

[21]  Thomas Zimmermann,et al.  Towards the next generation of bug tracking systems , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[22]  Ladan Tahvildari,et al.  A Comparative Study of the Performance of IR Models on Duplicate Bug Detection , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[23]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[24]  Kai Ming Ting,et al.  Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[25]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..