An empirical study of predicting software faults with case-based reasoning

The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.

[1]  Harvey P. Siy,et al.  An experiment to assess the cost-benefits of code inspections in large scale software development , 1995, SIGSOFT '95.

[2]  Taghi M. Khoshgoftaar,et al.  Detecting Outliers Using Rule-Based Modeling for Improving CBR-Based Software Quality Classification Models , 2003, ICCBR.

[3]  Edward B. Allen,et al.  Case-Based Software Quality Prediction , 2000, Int. J. Softw. Eng. Knowl. Eng..

[4]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[5]  Ray Bareiss,et al.  Interactive Model-Driven Case Adaptation for Instructional Software Design , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[6]  R. Cranley,et al.  Multivariate Analysis—Methods and Applications , 1985 .

[7]  Jeff Tian,et al.  Measurement and defect modeling for a legacy software system , 1995, Ann. Softw. Eng..

[8]  Taghi M. Khoshgoftaar,et al.  Modeling software quality: the Software Measurement Analysis and Reliability Toolkit , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[9]  Taghi M. Khoshgoftaar,et al.  Estimating software project effort by analogy based on linguistic values , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[10]  Abhijit S. Pandya,et al.  Application of neural networks for predicting program faults , 1995, Ann. Softw. Eng..

[11]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[12]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[13]  David M. Levine,et al.  Intermediate Statistical Methods and Applications: A Computer Package Approach , 1982 .

[14]  Bogdan Korel,et al.  Automated test data generation for programs with procedures , 1996, ISSTA '96.

[15]  Taghi M. Khoshgoftaar,et al.  Emerald: Software Metrics and Models on the Desktop , 1996, IEEE Softw..

[16]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[17]  Harvey P. Siy,et al.  An Experiment ot Assess the Cost-Benefits of Code Inspections in Large Scale Software Development , 1997, IEEE Trans. Software Eng..

[18]  Ralph Barletta,et al.  Building a case-based help desk application , 1993, IEEE Expert.

[19]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[20]  Adam A. Porter,et al.  Experimental Software Engineering: A Report on the State of the Art , 1995, 1995 17th International Conference on Software Engineering.

[21]  William E. Perry,et al.  Effective methods for software testing , 1995 .

[22]  N. E. Schneidewind,et al.  Body of Knowledge for Software Quality Measurement , 2002, Computer.

[23]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[24]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[25]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[26]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[27]  Usama M. Fayyad,et al.  Data Mining and Knowledge Discovery: Making Sense Out of Data , 1996, IEEE Expert.

[28]  Taghi M. Khoshgoftaar,et al.  Application of an attribute selection method to CBR-based software quality classification , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[29]  Taghi M. Khoshgoftaar,et al.  Analogy-Based Practical Classification Rules for Software Quality Estimation , 2003, Empirical Software Engineering.

[30]  Taghi M. Khoshgoftaar,et al.  Predicting fault-prone modules with case-based reasoning , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[31]  Martin Shepperd,et al.  Experiences Using Case-Based Reasoning to Predict Software Project Effort , 2000 .

[32]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[33]  Stephen G. MacDonell,et al.  Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques , 1999, Empirical Software Engineering.

[34]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[35]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[37]  Brigitte Bartsch-Spörl,et al.  Towards the Integration of Case-Based, Schema-Based and Model-Based Reasoning for Supporting Complex Design Tasks , 1995, ICCBR.

[38]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[39]  C. V. Ramamoorthy,et al.  Knowledge based tools for risk assessment in software development and reuse , 1993, Proceedings of 1993 IEEE Conference on Tools with Al (TAI-93).