An adaptive approach with active learning in software fault prediction

Background: Software quality prediction plays an important role in improving the quality of software systems. By mining software metrics, predictive models can be induced that provide software managers with insights into quality problems they need to tackle as effectively as possible. Objective: Traditional, supervised learning approaches dominate software quality prediction. Resulting models tend to be project specific. On the other hand, in situations where there are no previous releases, supervised learning approaches are not very useful because large training data sets are needed to develop accurate predictive models. Method: This paper eases the limitations of supervised learning approaches and offers good prediction performance. We propose an adaptive approach in which supervised learning and active learning are coupled together. NaiveBayes classifier is used as the base learner. Results: We track the performance at each iteration of the adaptive learning algorithm and compare it with the performance of supervised learning. Our results show that proposed scheme provides good fault prediction performance over time, i.e., it eventually outperforms the corresponding supervised learning approach. On the other hand, adaptive learning classification approach reduces the variance in prediction performance in comparison with the corresponding supervised learning algorithm. Conclusion: The adaptive approach outperforms the corresponding supervised learning approach when both use Naive-Bayes as base learner. Additional research is needed to investigate whether this observation remains valid with other base classifiers.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[3]  Dong Yu,et al.  Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .

[4]  Peng Yu,et al.  Online adaptive status prediction strategy for data-driven fault prognostics of complex systems , 2011, 2011 Prognostics and System Health Managment Confernece.

[5]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[6]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[7]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[8]  G. Michailidis,et al.  An Iterative Algorithm for Extending Learners to a Semi-Supervised Setting , 2008 .

[9]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[10]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[11]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[12]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[13]  Dong Tian,et al.  A Novel Adaptive Failure Detector for Distributed Systems , 2008, 2008 International Conference on Networking, Architecture, and Storage.

[14]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[15]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[18]  Yue Jiang,et al.  Comparing design and code metrics for software quality prediction , 2008, PROMISE '08.

[19]  Gholamreza Haffari,et al.  Analysis of Semi-Supervised Learning with the Yarowsky Algorithm , 2007, UAI.

[20]  Jie Ma,et al.  Data-based adaptive fault prediction method and its application , 2009, 2009 9th International Conference on Electronic Measurement & Instruments.

[21]  Norman F. Schneidewind Software metrics model for quality control , 1999 .