Fault prediction by utilizing self-organizing Map and Threshold

Predicting parts of the programs that are more defects prone could ease up the software testing process, which leads to testing cost and testing time reduction. Fault prediction models use software metrics and defect data of earlier or similar versions of the project in order to improve software quality and exploit available resources. However, some issues such as cost, experience, and time, limit the availability of faulty data for modules or classes. In such cases, researchers focus on unsupervised techniques such as clustering and they use experts or thresholds for labeling modules as faulty or not faulty. In this paper, we propose a prediction model by utilizing self-organizing map (SOM) with threshold to build a better prediction model that could help testers in labeling process and does not need experts to label the modules any more. Data sets obtained from three Turkish white-goods controller software are used in our empirical investigation. The results based on the proposed technique is shown to aid the testers in making better estimation in most of the cases in terms of overall error rate, false positive rate (FPR), and false negative rate (FNR).

[1]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[2]  Andreas Rauber,et al.  parSOM: Using Parallelism to Overcome Memory Latency in Self-Organizing Neural Networks , 2000, HPCN Europe.

[3]  Atchara Mahaweerawat,et al.  Adaptive Self-Organizing Map Clustering for Software Fault Prediction , 2007 .

[4]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[5]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[6]  Taghi M. Khoshgoftaar,et al.  Software quality classification modeling using the SPRINT decision tree algorithm , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[7]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[8]  Taghi M. Khoshgoftaar,et al.  Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[9]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[10]  Banu Diri,et al.  Software Fault Prediction with Object-Oriented Metrics Based Artificial Immune Recognition System , 2007, PROFES.

[11]  Ting Liu,et al.  The Comparison of SOM and K-means for Text Clustering , 2010, Comput. Inf. Sci..

[12]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[13]  Osama Abu Abbas,et al.  Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..

[14]  W. Pedrycz,et al.  Self organizing maps as a tool for software analysis , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[15]  Banu Diri,et al.  A Fault Prediction Model with Limited Fault Data to Improve Test Process , 2008, PROFES.

[16]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[17]  Banu Diri,et al.  Metrics-Driven Software Quality Prediction Without Prior Fault Data , 2010 .

[18]  Vandana Bhattacherjee,et al.  Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  Banu Diri,et al.  Software defect prediction using artificial immune recognition system , 2007 .

[20]  Banu Diri,et al.  Clustering and Metrics Thresholds Based Software Fault Prediction of Unlabeled Program Modules , 2009, 2009 Sixth International Conference on Information Technology: New Generations.

[21]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[22]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[23]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[24]  Simone Marinai SOM clustering for text retrieval and classification with examples on Indian scripts , 2007 .

[25]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[26]  José Alfredo Ferreira Costa Clustering and Visualizing SOM Results , 2010, IDEAL.

[27]  Swati M. Varade,et al.  Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure 1 , .

[28]  Irena Koprinska,et al.  Learning to classify e-mail , 2007, Inf. Sci..