Predicting faults in software modules can lead to a high quality and more effective software development process to follow. However, the results of a fault prediction model have to be properly interpreted before incorporating them into any decision making. Most of the earlier studies have used the prediction accuracy as the main criteria to compare amongst competing fault prediction models. However, we show that besides accuracy, other criteria like number of false positives and false negatives can equally be important to choose a candidate model for fault prediction. We have used five NASA software data sets in our experiment. Our results suggest that the performance of Simple Logistic is better than the others on raw data sets whereas the performance of Neural Network was found to be better when we applied dimensionality reduction method on raw data sets. When we used data pre-processing techniques, the prediction accuracy of Random Forest was found to be better in both cases i.e. with and without dimensionality reduction but reliability of Simple Logistic was better than Random Forest because it had less number of fault negatives.
[1]
Bojan Cukic,et al.
Robust prediction of fault-proneness by random forests
,
2004,
15th International Symposium on Software Reliability Engineering.
[2]
Ioannis Stamelos,et al.
Software Defect Prediction Using Regression via Classification
,
2006,
IEEE International Conference on Computer Systems and Applications, 2006..
[3]
Ian H. Witten,et al.
Data mining: practical machine learning tools and techniques with Java implementations
,
2002,
SGMD.
[4]
Taghi M. Khoshgoftaar,et al.
Tree-based software quality estimation models for fault prediction
,
2002,
Proceedings Eighth IEEE Symposium on Software Metrics.
[5]
Tong-Seng Quah,et al.
Application of neural networks for software quality prediction using object-oriented metrics
,
2003,
International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..