Weighted software metrics aggregation and its application to defect prediction

It is a well-known practice in software engineering to aggregate software metrics to assess software artifacts for various purposes, such as their maintainability or their proneness to contain bugs. For different purposes, different metrics might be relevant. However, weighting these software metrics according to their contribution to the respective purpose is a challenging task. Manual approaches based on experts do not scale with the number of metrics. Also, experts get confused if the metrics are not independent, which is rarely the case. Automated approaches based on supervised learning require reliable and generalizable training data, a ground truth, which is rarely available. We propose an automated approach to weighted metrics aggregation that is based on unsupervised learning. It sets metrics scores and their weights based on probability theory and aggregates them. To evaluate the effectiveness, we conducted two empirical studies on defect prediction, one on ca. 200 000 code changes, and another ca. 5 000 software classes. The results show that our approach can be used as an agnostic unsupervised predictor in the absence of a ground truth.

[1]  D. Borsboom,et al.  The Theoretical Status of Latent Variables , 2003 .

[2]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[3]  Joseph Gil,et al.  On the correlation between size and metric validity , 2017, Empirical Software Engineering.

[4]  Julian C. Stanley,et al.  Differential Weighting: A Review of Methods and Empirical Studies1 , 1970 .

[5]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[6]  Jan-Frederik Mai,et al.  Lévy-frailty copulas , 2009, J. Multivar. Anal..

[7]  Jean-Luc Marichal,et al.  An axiomatic approach of the discrete Choquet integral as a tool to aggregate interacting criteria , 2000, IEEE Trans. Fuzzy Syst..

[8]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ordering and Classification of Fault-Prone Software Modules , 1999, Empirical Software Engineering.

[9]  E. Choo,et al.  Interpretation of criteria weights in multicriteria decision making , 1999 .

[10]  Xinli Yang,et al.  TLEL: A two-layer ensemble learning approach for just-in-time defect prediction , 2017, Inf. Softw. Technol..

[11]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[12]  David Lo,et al.  Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction , 2018, Empirical Software Engineering.

[13]  James M. Bieman,et al.  Software Metrics: A Rigorous and Practical Approach, Third Edition , 2014 .

[14]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[15]  Stan Schenkerman,et al.  Use and Abuse of Weights in Multiple Objective Decision Support Models , 1991 .

[16]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[17]  Hongfang Liu,et al.  An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules , 2009, IEEE Transactions on Software Engineering.

[18]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[19]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[20]  Christer Carlsson,et al.  Multiple criteria decision making: The case for interdependence , 1995, Comput. Oper. Res..

[21]  Reinhold Plösch,et al.  Operationalised product quality models and assessment: The Quamoco approach , 2014, Inf. Softw. Technol..

[22]  Xiang Chen,et al.  MULTI: Multi-objective effort-aware just-in-time software defect prediction , 2018, Inf. Softw. Technol..

[23]  Stefan Wagner,et al.  Software Product Quality Control , 2013, Springer Berlin Heidelberg.

[24]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[25]  Thomas L. Saaty,et al.  DECISION MAKING WITH THE ANALYTIC HIERARCHY PROCESS , 2008 .

[26]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[27]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[28]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[29]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[30]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[31]  Joost Visser,et al.  Standardized code quality benchmarking for improving software maintainability , 2011, Software Quality Journal.

[32]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[33]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..