Building Defect Prediction Models in Practice

The information about which modules of a future version of a software system will be defect-prone is a valuable planning aid for quality managers and testers. Defect prediction promises to indicate these defect-prone modules. In this chapter, building a defect prediction model from data is characterized as an instance of a data-mining task, and key questions and consequences arising when establishing defect prediction in a large software development project are discussed. Special emphasis is put on discussions on how to choose a learning algorithm, select features from different data sources, deal with noise and data quality issues, as well as model evaluation for evolving systems. These discussions are accompanied by insights and experiences gained by projects on data mining and defect prediction in the context of large software systems conducted by the authors over the last couple of years. One of these projects has been selected to serve as an illustrative use case throughout the chapter.

[1]  Meir M. Lehman,et al.  Evolution in software and related areas , 2001, IWPSE '01.

[2]  Lucas Layman,et al.  Iterative identification of fault-prone binaries using in-process metrics , 2008, ESEM '08.

[3]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[4]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[5]  Filippo Lanubile,et al.  Comparing models for identifying fault-prone software components , 1995, SEKE.

[6]  Yue Jiang,et al.  Variance Analysis in Software Fault Prediction Models , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[7]  Stefan Koch Software evolution in open source projects—a large-scale investigation , 2007 .

[8]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  J.F. Elder Top 10 data mining mistakes , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[11]  Gonzalo Mariscal,et al.  A survey of data mining and knowledge discovery process models and methodologies , 2010, The Knowledge Engineering Review.

[12]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[13]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[14]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Dror G. Feitelson,et al.  The Linux kernel as a case study in software evolution , 2010, J. Syst. Softw..

[16]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  Rudolf Ramler,et al.  What Software Repositories Should Be Mined for Defect Predictors? , 2009, 2009 35th Euromicro Conference on Software Engineering and Advanced Applications.

[18]  Kwei-Jay Lin,et al.  RT-Llama: Providing Middleware Support for Real-Time SOA , 2010, Int. J. Syst. Serv. Oriented Eng..

[19]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[20]  Nachiappan Nagappan,et al.  Predicting defects with program dependencies , 2009, ESEM 2009.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[23]  Rudolf Ramler,et al.  Noise in Bug Report Data and the Impact on Defect Prediction Results , 2013, 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement.

[24]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[25]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[26]  Frank Hermann,et al.  Conformance Analysis of Organizational Models: A New Enterprise Modeling Framework using Algebraic Graph Transformation , 2013, Int. J. Inf. Syst. Model. Des..

[27]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[28]  Rudolf Ramler,et al.  Issues and effort in integrating data from heterogeneous software repositories and corporate databases , 2008, ESEM '08.

[29]  Mary Shaw,et al.  Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc. , 2006, ICSE.

[30]  Hongfang Liu,et al.  An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules , 2009, IEEE Transactions on Software Engineering.

[31]  Wasif Afzal,et al.  Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness , 2010, 2010 Asia Pacific Software Engineering Conference.

[32]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[33]  Elaine J. Weyuker,et al.  Software engineering research: from cradle to grave , 2007, ESEC-FSE '07.

[34]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[35]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[36]  Abraham Bernstein,et al.  Predicting defect densities in source code files with decision tree learners , 2006, MSR '06.

[37]  Felix Kossak,et al.  Extracting knowledge and computable models from data - needs, expectations, and experience , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[38]  Felix Kossak,et al.  Key Questions in Building Defect Prediction Models in Practice , 2009, PROFES.

[39]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[40]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[41]  T. Zimmermann,et al.  Predicting Faults from Cached History , 2007, 29th International Conference on Software Engineering (ICSE'07).

[42]  Abraham Bernstein,et al.  Improving defect prediction using temporal features and non linear models , 2007, IWPSE '07.

[43]  Qiang Tu,et al.  Growth, evolution, and structural change in open source software , 2001, IWPSE '01.

[44]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[45]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[46]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[47]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[48]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[49]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[50]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[51]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[52]  Premkumar T. Devanbu,et al.  Ownership, experience and defects: a fine-grained study of authorship , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[53]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[54]  Nachiappan Nagappan,et al.  Predicting Subsystem Failures using Dependency Graph Complexities , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[55]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[56]  Michael Felderer,et al.  Experiences and Challenges of Introducing Risk-Based Testing in an Industrial Project , 2013, SWQD.

[57]  Rudolf Ramler The impact of product development on the lifecycle of defects , 2008, DEFECTS '08.

[58]  Elaine J. Weyuker,et al.  Programmer-based fault prediction , 2010, PROMISE '10.

[59]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[60]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[61]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[62]  Rudolf Ramler,et al.  The usual suspects: a case study on delivered defects per developer , 2010, ESEM '10.

[63]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[64]  Rudolf Ramler,et al.  Applying Heuristic Approaches for Predicting Defect-Prone Software Components , 2011, EUROCAST.

[65]  Ulrich Bodenhofer,et al.  Fuzzy modeling with decision trees , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[66]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[67]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[68]  Stefan Biffl,et al.  A Framework for Defect Prediction in Specific Software Project Contexts , 2008, CEE-SET.