Replacing code metrics in software fault prediction with early life cycle metrics

Fault prediction models are typically built using software metrics collected throughout the software lifecycle process. Given without a previous release version of the software product, the earlier software metrics collected, the earlier the prediction models can be built to guide software verification and validation activities. In this experiment, we investigate the problem in software fault prediction modeling: would it be possible to replace later code metrics by earlier design metrics? We find that 11 code metrics can be replaced by 6 design metrics using Canonical Correlation Analysis (CCA), a multivariate statistical analysis method. After removing these 11 replaceable code metrics from building fault prediction models, the built models typically have the same performance statistically as using all code metrics. This study shows that earlier available design metrics can be used to replace late lifecycle code metrics. This would make it possible to identify faults earlier before code implementation in software lifecycle. Furthermore, due to the expensiveness of metric collection, using less metrics to maintain the same predictive power models has potential high cost-savings in IV & V activities.

[1]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[3]  N. Nagappan,et al.  Static analysis tools as early indicators of pre-release defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[4]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[5]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[6]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[7]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[8]  Taghi M. Khoshgoftaar,et al.  Tree-based software quality estimation models for fault prediction , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[9]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[10]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[11]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[12]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Lionel C. Briand,et al.  Toward the Reverse Engineering of UML Sequence Diagrams for Distributed Java Software , 2006, IEEE Transactions on Software Engineering.

[15]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[16]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[17]  Ahmed E. Hassan,et al.  Studying the Impact of Social Structures on Software Quality , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[18]  E.J. Weyuker,et al.  Using Developer Information as a Factor for Fault Prediction , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[19]  Yue Jiang,et al.  Comparing design and code metrics for software quality prediction , 2008, PROMISE '08.

[20]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[21]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[22]  Tarja Systä,et al.  Static and Dynamic Reverse Engineering Techniques for Java Software Systems , 2000 .

[23]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[24]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[25]  Harald C. Gall,et al.  Tracking concept drift of software projects using defect prediction quality , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[26]  P. Tonella Reverse engineering of object oriented code , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[27]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[28]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[29]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[30]  Giuliano Antoniol,et al.  Object-oriented design patterns recovery , 2001, J. Syst. Softw..

[31]  AntoniolGiuliano,et al.  Recovering Traceability Links between Code and Documentation , 2002 .

[32]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[33]  Ming Zhao,et al.  A comparison between software design and code metrics for the prediction of software fault content , 1998, Inf. Softw. Technol..

[34]  Gerardo Canfora,et al.  New Frontiers of Reverse Engineering , 2007, Future of Software Engineering (FOSE '07).

[35]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[36]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[37]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[38]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.