Poster: DWEN: Deep Word Embedding Network for Duplicate Bug Report Detection in Software Repositories

Bug report filing is a major part of software maintenance. Due to the asynchronous nature of the bug filing process, duplicate bug reports are filed. Detecting duplicate bug reports is an important aspect of software maintenance since the same bug should not be assigned to different developers. In this poster, we present Deep Word Embedding Network for computing similarity between two bug reports for the task of duplicate bug report detection. We propose to learn a two step model to calculate similarity between two bug reports by means of word embeddings and a deep neural network. We run experiments on two large datasets of Mozilla Project and Open Office Project and compare the proposed approach with baselines and related approaches. Through this initial work, we show that a combination of word embeddings and deep neural networks can be used to improve duplicate bug report detection.

[1]  Bonita Sharif,et al.  Generating duplicate bug datasets , 2014, MSR 2014.

[2]  Xinli Yang,et al.  Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[3]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[4]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Nicholas A. Kraft,et al.  New features for duplicate bug detection , 2014, MSR 2014.