Data reduction for bug triage using effective prediction of reduction order techniques

A large open source project consists of a wide range of bug reports. In open source project, bug reports are available and these reports can be modified by anyone. Bugs are software defects whose prediction is highly difficult. To detect the bugs, machine learning classifier has been proposed. It segregates the bug reports and developers and it learns the type of report suitable for each developer. Bug triage is the process of assigning the bug for the appropriate developer. The techniques include preprocessing, machine learning classifier, instance selection and feature selection. The aim of this paper is to attain a data set reduction in bug triage by including the representative values along with the statistical values of the bug data set. Our work considers the dataset from the open source project Eclipse. We focus on reducing the data scale and thereby improving the accuracy. This can be achieved by building a representative model for prediction of reduction orders by including the summary, metadata. Our proposed work attains an accuracy result of 96.5% that is better when compared with existing work.

[1]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[2]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[3]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[4]  Thomas Zimmermann,et al.  Information needs in bug reports: improving cooperation between developers and users , 2010, CSCW '10.

[5]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[6]  John Anvik,et al.  Automating bug report assignment , 2006, ICSE.

[7]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[8]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[9]  Ken-ichi Matsumoto,et al.  Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[10]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[13]  He Jiang,et al.  Developer prioritization in bug repositories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  Jun Yan,et al.  Automatic Bug Triage using Semi-Supervised Text Classification , 2017, SEKE.

[15]  Taghi M. Khoshgoftaar,et al.  Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[16]  Frank Tip,et al.  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking , 2010, IEEE Transactions on Software Engineering.

[17]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[19]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[20]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[21]  Seung-won Hwang,et al.  CosTriage: A Cost-Aware Triage Algorithm for Bug Reporting Systems , 2011, AAAI.

[22]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[23]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.