Software defect prediction in large space systems through hybrid feature selection and classification

Data mining and machine learning techniques have been used in several scientific applications including software fault predictions in large space systems. State-of the-art research revealed that existing space systems succumb to enigmatic software faults leading to critical loss of life and capital. This article presents a novel approach to solve this issue of overlooking software faults by utilizing both features selection and classification techniques to accurately predict software defects in aerospace systems. The main objective was to identify the preeminent feature selection and prediction technique that enhanced the software fault prediction accuracy with the optimal set of features. The investigations affirmed that a novel hybrid feature selection method revealed the most optimal set of predictive features although no particular predictive technique was suitable to predict faults in all space system datasets. Besides, the exploration of data mining techniques in fault prediction on the NASA Lunar space system software data clearly portrayed the improved fault prediction accuracy (~82% to ~98%) with the feature set selected by the proposed Hybrid Feature Selection method. Also, the random sub sampling method revealed an improved mean Matthew’s Correlation Coefficient (MCC) and accuracy ranging from ~0.7 to ~0.9 and ~86% to ~98% respectively. This we believe generates further scope for future investigations on the most contributing space system features for fault prediction thus enabling design of aerospace systems with minimal faults and enhanced performance.

[1]  R. Malor,et al.  Ripple down rules: possibilities and limitations , 2010 .

[2]  Shomona Gracia Jacob,et al.  Prediction of P53 Mutants (Multiple Sites) Transcriptional Activity Based on Structural (2D&3D) Properties , 2013, PloS one.

[3]  Arie van Deursen,et al.  Introduction to the special issue on mining software repositories , 2013, Empirical Software Engineering.

[4]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[5]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[6]  R. Geetha Ramani,et al.  Predicting fault-prone software modules using feature selection and classification through data mining algorithms , 2012, 2012 IEEE International Conference on Computational Intelligence and Computing Research.

[7]  Izzat Alsmadi,et al.  Enhance Rule Based Detection for Software Fault Prone Modules , 2012 .

[8]  Chao Liu,et al.  Efficient mining of iterative patterns for software specification discovery , 2007, KDD '07.

[9]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[10]  Bhekisipho Twala,et al.  Predicting Software Faults in Large Space Systems using Machine Learning Techniques , 2011 .

[11]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[12]  Harald C. Gall,et al.  Guest editors introduction: special issue on mining software repositories , 2009, Empirical Software Engineering.

[13]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[14]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[15]  S. Salzberg,et al.  INSTANCE-BASED LEARNING : Nearest Neighbour with Generalisation , 1995 .

[16]  Audris Mockus,et al.  Guest Editor's Introduction: Special Issue on Mining Software Repositories , 2005, IEEE Trans. Software Eng..

[17]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[18]  Nazlia Omar,et al.  Arabic text classification using k-nearest neighbour algorithm , 2015, Int. Arab J. Inf. Technol..

[19]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[20]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[21]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[22]  R. Geetha Ramani,et al.  Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models , 2013, PloS one.

[23]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[24]  Monique Snoeck,et al.  Classification With Ant Colony Optimization , 2007, IEEE Transactions on Evolutionary Computation.