Predicting Fault-prone Software Module Using Data Mining Technique and Fuzzy Logic

This paper discusses a new model towards reliability and quality improvement of software systems by predicting fault-prone module before testing. Model utilizes the classification capability of data mining techniques and knowledge stored in software metrics to classify the software module as fault-prone or not fault-prone. A decision tree is constructed using ID3 algorithm for existing project data in order to gain information for the purpose of decision making whether a particular module id fault-prone or not. The gained information is converted into fuzzy rules and integrated with fuzzy inference system to predict fault-prone or not fault-prone software module for target data. The model is also able to predict fault-proneness degree of faulty module. The goal is to help software manager to concentrate their testing efforts to fault-prone modules in order to improve the reliability and quality of the software system. We used NASA projects data set from the PROMOSE repository to validate the predictive accuracy of the model.

[1]  Taghi M. Khoshgoftaar,et al.  Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm , 2003, Int. J. Artif. Intell. Tools.

[2]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[3]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[5]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[6]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[7]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[8]  Nicolino J. Pizzi,et al.  Software quality prediction using fuzzy integration: a case study , 2007, Soft Comput..

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[11]  Ratna Babu Chinnam,et al.  A fuzzy logic based approach to reliability improvement estimation during product development , 2003, Reliab. Eng. Syst. Saf..

[12]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[13]  Neeraj Kumar Goyal,et al.  A Fuzzy Model for Early Software Fault Prediction Using Process Maturity and Software Metrics , 2009 .

[14]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[15]  John D. Musa,et al.  Software reliability measurement , 1984, J. Syst. Softw..

[16]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[17]  Taghi M. Khoshgoftaar,et al.  Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[18]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Banu Diri,et al.  A Fault Prediction Model with Limited Fault Data to Improve Test Process , 2008, PROFES.

[21]  Bojan Cukic,et al.  Predicting fault prone modules by the Dempster-Shafer belief networks , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[22]  Norman F. Schneidewind,et al.  Investigation of logistic regression as a discriminant of software quality , 2001, Proceedings Seventh International Software Metrics Symposium.

[23]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[24]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[25]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[26]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[27]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[28]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..