Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.

[1]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[2]  Wei-Ying Ma,et al.  Improving text classification using local latent semantic indexing , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[4]  Lyndon Hiew,et al.  Assisted Detection of Duplicate Bug Reports , 2006 .

[5]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[6]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[7]  Y. Raghu Reddy,et al.  Poster: DWEN: Deep Word Embedding Network for Duplicate Bug Report Detection in Software Repositories , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[8]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Xijin Tang,et al.  Text classification based on multi-word with support vector machine , 2008, Knowl. Based Syst..

[10]  Eleni Stroulia,et al.  Detecting duplicate bug reports with software engineering domain knowledge , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[11]  Mira Mezini,et al.  Finding Duplicates of Your Yet Unwritten Bug Report , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[12]  Bonita Sharif,et al.  Generating duplicate bug datasets , 2014, MSR 2014.

[13]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[14]  XuLing,et al.  Automatically classifying software changes via discriminative topic model , 2016 .

[15]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[16]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Ling Xu,et al.  Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project , 2016, J. Syst. Softw..

[18]  Ling Xu,et al.  Automated Duplicate Bug Report Detection Using Multi-Factor Analysis , 2016, IEICE Trans. Inf. Syst..

[19]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[20]  David Lo,et al.  Chaff from the Wheat: Characterizing and Determining Valid Bug Reports , 2020, IEEE Transactions on Software Engineering.

[21]  Nicolás Serrano,et al.  Bugzilla, ITracker, and Other Bug Trackers , 2005, IEEE Softw..

[22]  Alexandre Denis,et al.  Do Convolutional Networks need to be Deep for Text Classification ? , 2017, AAAI Workshops.

[23]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[24]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[25]  Ling Xu,et al.  Duplication Detection for Software Bug Reports based on Topic Model , 2016, 2016 9th International Conference on Service Science (ICSS).

[26]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[27]  Ying Fu,et al.  Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation , 2015, Inf. Softw. Technol..

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[30]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[31]  Jian Zhou,et al.  Learning to rank duplicate bug reports , 2012, CIKM.

[32]  Daniel Lucrédio,et al.  An Initial Study on the Bug Report Duplication Problem , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[33]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[34]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[35]  Dan Yang,et al.  A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis , 2016, Inf. Softw. Technol..

[36]  Catarina Costa,et al.  TIPMerge: recommending experts for integrating changes across branches , 2016, SIGSOFT FSE.

[37]  Y. Raghu Reddy,et al.  Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories , 2018, ICTIR.

[38]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[39]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.