Prediction of faults-slip-through in large software projects: an empirical evaluation

A large percentage of the cost of rework can be avoided by finding more faults earlier in a software test process. Therefore, determination of which software test phases to focus improvement work on has considerable industrial interest. We evaluate a number of prediction techniques for predicting the number of faults slipping through to unit, function, integration, and system test phases of a large industrial project. The objective is to quantify improvement potential in different test phases by striving toward finding the faults in the right phase. The results show that a range of techniques are found to be useful in predicting the number of faults slipping through to the four test phases; however, the group of search-based techniques (genetic programming, gene expression programming, artificial immune recognition system, and particle swarm optimization–based artificial neural network) consistently give better predictions, having a representation at all of the test phases. Human predictions are consistently better at two of the four test phases. We conclude that the human predictions regarding the number of faults slipping through to various test phases can be well supported by the use of search-based techniques. A combination of human and an automated search mechanism (such as any of the search-based techniques) has the potential to provide improved prediction results.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Robert T. Hughes,et al.  Expert judgement as an estimating method , 1996, Inf. Softw. Technol..

[3]  Barbara A. Kitchenham,et al.  An investigation of analysis techniques for software datasets , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[4]  Reidar Conradi,et al.  An empirical study of software reuse vs. defect-density and stability , 2004, Proceedings. 26th International Conference on Software Engineering.

[5]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[6]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[7]  Stefan Wagner,et al.  A literature survey of the quality economics of defect-detection techniques , 2006, ISESE '06.

[8]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[9]  Jeff Tian Quality-evaluation models and measurements , 2004, IEEE Software.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[12]  Miroslaw Staron,et al.  Predicting weekly defect inflow in large software projects based on project planning and test status , 2008, Inf. Softw. Technol..

[13]  Sam Kash Kachigan Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods , 1986 .

[14]  Ioan Cristian Trelea,et al.  The particle swarm optimization algorithm: convergence analysis and parameter selection , 2003, Inf. Process. Lett..

[15]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[16]  Claes Wohlin,et al.  Faults-slip-through - a concept for measuring the efficiency of the test process , 2006, Softw. Process. Improv. Pract..

[17]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[18]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[19]  Eliza Varney,et al.  Institute Of Electrical And Electronic Engineers, Inc , 2010 .

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  Lars-Ola Damm,et al.  Early and Cost-Effective Software Fault Detection: Measurement and Implementation in an Industrial Setting , 2007 .

[22]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[23]  CatalCagatay,et al.  A systematic review of software fault prediction studies , 2009 .

[24]  Jonathan Timmis,et al.  Artificial Immune Recognition System (AIRS): An Immune-Inspired Supervised Learning Algorithm , 2004, Genetic Programming and Evolvable Machines.

[25]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[26]  Wasif Afzal,et al.  A Comparative Evaluation of Using Genetic Programming for Predicting Fault Count Data , 2008, 2008 The Third International Conference on Software Engineering Advances.

[27]  Ruppa K. Thulasiram,et al.  PSO based neural network for time series forecasting , 2009, 2009 International Joint Conference on Neural Networks.

[28]  N. Nagappan,et al.  Static analysis tools as early indicators of pre-release defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[29]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[31]  Paul Davidsson,et al.  Generic Methods for Multi-criteria Evaluation , 2008, SDM.

[32]  Wasif Afzal,et al.  Search-based approaches to software fault prediction and software testing , 2009 .

[33]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[34]  Günther Ruhe,et al.  Search Based Software Engineering , 2013, Lecture Notes in Computer Science.

[35]  Lars Lundberg,et al.  Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study , 2007, J. Syst. Softw..

[36]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  RunesonPer,et al.  What Do We Know about Defect Detection Methods , 2006 .

[38]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[39]  Wasif Afzal,et al.  Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness , 2010, 2010 Asia Pacific Software Engineering Conference.

[40]  Michelle Cartwright,et al.  On Building Prediction Systems for Software Engineers , 2000, Empirical Software Engineering.

[41]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[42]  Lionel C. Briand,et al.  A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content , 2000, IEEE Trans. Software Eng..

[43]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[44]  Tobias Blickle,et al.  Theory of evolutionary algorithms and application to system synthesis , 1997 .

[45]  Steven R. Rakitin,et al.  Software verification and validation for practitioners and managers , 2001 .

[46]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[47]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[48]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[49]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[50]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[51]  J. Ioannidis Why Most Published Research Findings Are False , 2005 .

[52]  Yong Wang A New Approach to Fitting Linear Models in High Dimensional Spaces , 2000 .

[53]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[54]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[55]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[56]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[57]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[58]  Wasif Afzal,et al.  Search-based Prediction of Fault-slip-through in Large Software Projects , 2010, 2nd International Symposium on Search Based Software Engineering.

[59]  Adam C. Marshall,et al.  A relationship between software coverage metrics and reliability , 1994, Softw. Test. Verification Reliab..