Surprise Bug Report Prediction Utilizing Optimized Integration with Imbalanced Learning Strategy

In software projects, a large number of bugs are usually reported to bug repositories. Due to the limited budge and work force, the developers often may not have enough time and ability to inspect all the reported bugs, and thus they often focus on inspecting and repairing the highly impacting bugs. Among the high-impact bugs, surprise bugs are reported to be a fatal threat to the software systems, though they only account for a small proportion. Therefore, the identification of surprise bugs becomes an important work in practices. In recent years, some methods have been proposed by the researchers to identify surprise bugs. Unfortunately, the performance of these methods in identifying surprise bugs is still not satisfied for the software projects. The main reason is that surprise bugs only occupy a small percentage of all the bugs, and it is difficult to identify these surprise bugs from the imbalanced distribution. In order to overcome the imbalanced category distribution of the bugs, a method based on machine learning to predict surprise bugs is presented in this paper. This method takes into account the textual features of the bug reports and employs an imbalanced learning strategy to balance the datasets of the bug reports. Then these datasets after balancing are used to train three selected classifiers which are built by three different classification algorithms and predict the datasets with unknown type. In particular, an ensemble method named optimization integration is proposed to generate a unique and best result, according to the results produced by the three classifiers. This ensemble method is able to adjust the ability of the classifier to detect different categories based on the characteristics of different projects and integrate the advantages of three classifiers. The experiments performed on the datasets from 4 software projects show that this method performs better than the previous methods in terms of detecting surprise bugs.

[1]  Wu Deng,et al.  An Improved Ant Colony Optimization Algorithm Based on Hybrid Strategies for Scheduling Problem , 2019, IEEE Access.

[2]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[4]  Xinli Yang,et al.  High-Impact Bug Report Identification with Imbalanced Learning Strategies , 2017, Journal of Computer Science and Technology.

[5]  David Lo,et al.  DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis , 2013, ICSM.

[6]  Serge Demeyer,et al.  Predicting Reassignments of Bug Reports - An Exploratory Investigation , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[7]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[8]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, ESEM 2007.

[9]  Kenneth Magel,et al.  Efficient Bug Triaging Using Text Mining , 2013, J. Softw..

[10]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[11]  Liang Feng,et al.  Practical Duplicate Bug Reports Detection in a Large Web-Based Development Community , 2013, APWeb.

[12]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[13]  Tao Xie,et al.  Cooperative Software Testing and Analysis: Advances and Challenges , 2014, Journal of Computer Science and Technology.

[14]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[15]  Neeraj Bhargava,et al.  Decision Tree Analysis on J48 Algorithm for Data Mining , 2013 .

[16]  Audris Mockus,et al.  High-impact defects: a study of breakage and surprise defects , 2011, ESEC/FSE '11.

[17]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[18]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[19]  Ge Guo,et al.  Quantized Sliding Mode Control of Unmanned Marine Vehicles: Various Thruster Faults Tolerated With a Unified Model , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[20]  Rong Chen,et al.  Identify Severity Bug Report with Distribution Imbalance by CR-SMOTE and ELM , 2019, Int. J. Softw. Eng. Knowl. Eng..

[21]  Rong Chen,et al.  Identification of High Priority Bug Reports via Integration Method , 2018 .

[22]  Elliot Soloway,et al.  Where the bugs are , 1985, CHI '85.

[23]  Asha Gowda Karegowda,et al.  Cascading k-means with Ensemble Learning: Enhanced Categorization of Diabetic Data , 2012, J. Intell. Syst..

[24]  Song Wang,et al.  Local-based active classification of test report to assist crowdsourced testing , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[26]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[27]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[28]  Song Wang,et al.  Towards Effectively Test Report Classification to Assist Crowdsourced Testing , 2016, ESEM.

[29]  Sotiris B. Kotsiantis,et al.  Combining Bagging, Boosting and Dagging for Classification Problems , 2007, KES.

[30]  Zarinah Mohd Kasirun,et al.  Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[31]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[32]  Cheng G. Weng,et al.  A New Evaluation Measure for Imbalanced Datasets , 2008, AusDM.

[33]  Xinli Yang,et al.  Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[34]  Ming Wen,et al.  An empirical study of bug report field reassignment , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[35]  Onaiza Maqbool,et al.  Bug Prioritization to Facilitate Bug Report Triage , 2012, Journal of Computer Science and Technology.

[36]  Hajimu Iida,et al.  Understanding Key Features of High-Impact Bug Reports , 2017, 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[37]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[38]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[39]  Fang Wu,et al.  Predicting Defect Priority Based on Neural Networks , 2010, ADMA.

[40]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[41]  Kangshun Li,et al.  Identifying key classes in object-oriented software using generalized k-core decomposition , 2018, Future Gener. Comput. Syst..

[42]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[43]  Wu Deng,et al.  A novel collaborative optimization algorithm in solving complex optimization problems , 2016, Soft Computing.

[44]  Bin Zhang,et al.  Timely daily activity recognition from headmost sensor events. , 2019, ISA transactions.

[45]  Gail E. Kaiser,et al.  BUGMINER: Software Reliability Analysis Via Data Mining of Bug Reports , 2011, SEKE.

[46]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[47]  Ken-ichi Matsumoto,et al.  Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[48]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[49]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[50]  Ken-ichi Matsumoto,et al.  A Dataset of High Impact Bugs: Manually-Classified Issue Reports , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[51]  Taghi M. Khoshgoftaar,et al.  Balancing Misclassification Rates in Classification-Tree Models of Software Quality , 2004, Empirical Software Engineering.

[52]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[53]  Rong Chen,et al.  Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution , 2019, IEEE Transactions on Fuzzy Systems.

[54]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[55]  Rong Chen,et al.  Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification , 2018, IEEE Access.

[56]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[57]  Rong Chen,et al.  The Influence Ranking for Testers in Bug Tracking Systems , 2019, Int. J. Softw. Eng. Knowl. Eng..

[58]  Hui Li,et al.  Fault-tolerant Compensation Control Based on Sliding Mode Technique of Unmanned Marine Vehicles Subject to Unknown Persistent Ocean Disturbances , 2020, International Journal of Control, Automation and Systems.

[59]  Bo Li,et al.  Study on an improved adaptive PSO algorithm for solving multi-objective gate assignment , 2017, Applied Soft Computing.

[60]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[61]  Xi Yang,et al.  Evolutionary extreme learning machine with sparse cost matrix for imbalanced learning. , 2019, ISA transactions.

[62]  Carl K. Chang,et al.  ElementRank: Ranking Java Software Classes and Packages using a Multilayer Complex Network-Based Approach , 2019 .

[63]  David Lo,et al.  Accurate developer recommendation for bug resolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[64]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.