论文信息 - Performance Analysis of Datamining Algorithms for Software Quality Prediction

Performance Analysis of Datamining Algorithms for Software Quality Prediction

Data mining techniques are applied in building software fault prediction models for improving the software quality. Early identification of high-risk modules can assist in quality enhancement efforts to modules that are likely to have a high number of faults. Classification tree models are simple and effective as software quality prediction models, and timely predictions of defects from such models can be used to achieve high software reliability. In this paper, the performance of five data mining classifier algorithms named J48, CART, Random Forest, BFTree and Naïve Bayesian classifier(NBC) are evaluated based on 10 fold cross validation test. Experimental results using KC2 NASA software metrics dataset demonstrates that decision trees are much useful for fault predictions and based on rules generated only some measurement attributes in the given set of the metrics play an important role in establishing final rules and for improving the software quality by giving correct predictions. Thus we can suggest that these attributes are sufficient for future classification process. To evaluate the performance of the above algorithms Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Receiver Operating Characteristic (ROC) and Accuracy measures are applied.

[1] Monica Chis. Evolutionary Decision Trees and Software Metrics for Module Defects Identification , 2008 .

[2] Marcus A. Maloof. On machine learning, ROC analysis, and statistical tests of significance , 2002, Object recognition supported by user interaction for service robots.

[3] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[4] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5] Shari Lawrence Pfleeger,et al. Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[6] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[8] N. Lavesson,et al. A multi-dimensional measure function for classifier performance , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[9] Shari Lawrence Pfleeger,et al. Software Metrics : A Rigorous and Practical Approach , 1998 .

[10] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .