Duplicate Bug Report Detection and Classification System Based on Deep Learning Technique

Duplicate bug report detection is a process of finding a duplicate bug report in the bug tracking system. This process is essential to avoid unnecessary work and rediscovery. In typical bug tracking systems, more than thousands of duplicate bug reports are reported every day. In turn, human cost, effort and time are increased. This makes it an important problem in the software management process. The solution is to automate the duplicate bug report detection system for reducing the manual effort, thus the productivity of triager’s and developer’s is increased. It also speeds up the process of software management as a result software maintenance cost is also reduced. However, existing systems are not quite accurate yet, in spite of these systems used various machine learning approaches. In this work, an automatic bug report detection and classification model is proposed using deep learning technique. The proposed system has three modules i.e. Preprocessing, Deep Learning Model and Duplicate Bug report Detection and Classification. Further, the proposed model used Convolutional Neural Network based deep learning model to extract relevant feature. These relevant features are used to determine the similar features of bug reports. Hence, the bug reports similarity is computers through these similar features. The performance of the proposed system is evaluated on six publicly available datasets using six performance metrics. It is noticed that the proposed system outperforms the existing systems by achieving an accuracy rate in the range of 85% to 99 % and recall@k rate in between 79%-94%. Moreover, the effectiveness of the proposed system is also measured on the cross training datasets of same and different domain. The proposed system achieves a good high accuracy rate for same domain data sets and low accuracy rate for different domain datasets.

[1]  Bhushan Suresh Patil Deep Learning for Natural Language Processing , 2021 .

[2]  Chen Meng,et al.  Data-Driven Feature Extraction for Analog Circuit Fault Diagnosis Using 1-D Convolutional Neural Network , 2020, IEEE Access.

[3]  Abdelwahab Hamou-Lhadj,et al.  An HMM-based approach for automatic detection and classification of duplicate bug reports , 2019, Inf. Softw. Technol..

[4]  Naveen K. Chilamkurti,et al.  A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting , 2019, Sensors.

[5]  Yashika Sharma,et al.  Automated Bug Reporting System with Keyword-Driven Framework , 2019 .

[6]  G. Senthil Kumar,et al.  Effective Bug Processing and Tracking System , 2018, Journal of Computational and Theoretical Nanoscience.

[7]  Sanjay Kumar,et al.  A Study on the Image Detection Using Convolution Neural Networks and TenserFlow , 2018, 2018 International Conference on Inventive Research in Computing Applications (ICIRCA).

[8]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[9]  Hang Li,et al.  Deep learning for natural language processing: advantages and challenges , 2018 .

[10]  Ashima Kukkar,et al.  A Supervised Bug Report Classification with Incorporate and Textual field Knowledge , 2018 .

[11]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12]  Abdelwahab Hamou-Lhadj,et al.  DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[13]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[14]  Abdelwahab Hamou-Lhadj,et al.  An effective method for detecting duplicate crash reports using crash traces and hidden Markov models , 2016, CASCON.

[15]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Eleni Stroulia,et al.  Detecting duplicate bug reports with software engineering domain knowledge , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[18]  Yuki Manabe,et al.  Can We Detect Bug Report Duplication with Unfinished Bug Reports? , 2015, 2015 Asia-Pacific Software Engineering Conference (APSEC).

[19]  Abdelwahab Hamou-Lhadj,et al.  CrashAutomata: an approach for the detection of duplicate crash reports based on generalizable automata , 2015, CASCON.

[20]  Bonita Sharif,et al.  Generating duplicate bug datasets , 2014, MSR 2014.

[21]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2016, 2013 10th Working Conference on Mining Software Repositories (MSR).

[22]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[23]  Mira Mezini,et al.  Finding Duplicates of Your Yet Unwritten Bug Report , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[24]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[25]  Ladan Tahvildari,et al.  A Comparative Study of the Performance of IR Models on Duplicate Bug Detection , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[26]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[27]  Carlos Jensen,et al.  Coping with duplicate bug reports in free/open source software projects , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[28]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[29]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[30]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[31]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[32]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[33]  David Lo,et al.  Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora , 2009, ACL.

[34]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[35]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[36]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[37]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[38]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[39]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[40]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[41]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[42]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .