Calculation and optimization of thresholds for sets of software metrics

In this article, we present a novel algorithmic method for the calculation of thresholds for a metric set. To this aim, machine learning and data mining techniques are utilized. We define a data-driven methodology that can be used for efficiency optimization of existing metric sets, for the simplification of complex classification models, and for the calculation of thresholds for a metric set in an environment where no metric set yet exists. The methodology is independent of the metric set and therefore also independent of any language, paradigm or abstraction level. In four case studies performed on large-scale open-source software metric sets for C functions, C+ +, C# methods and Java classes are optimized and the methodology is validated.

[1]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[2]  GrabowskiJens,et al.  An introduction to the testing and test control notation (TTCN-3) , 2003 .

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Edward Yourdon,et al.  Structured design : fundamentals of a discip!ine of computer proqram and system desiqn , 1979 .

[5]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[6]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[7]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[8]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[9]  Stephen R. Schach,et al.  Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures , 1998, Proceedings of the 20th International Conference on Software Engineering.

[10]  Dieter Hogrefe,et al.  An introduction to the testing and test control notation (TTCN-3) , 2003, Comput. Networks.

[11]  Angélica Caro,et al.  A Probabilistic Approach to Web Portal's Data Quality Evaluation , 2007 .

[12]  S. Mulaik,et al.  Foundations of Factor Analysis , 1975 .

[13]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[14]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[15]  S. Kanmani,et al.  Object-oriented software fault prediction using neural networks , 2007, Inf. Softw. Technol..

[16]  Norman F. Schneidewind Software metrics model for integrating quality control and prediction , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[17]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.

[18]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[19]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics , 1995 .

[20]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[21]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[24]  Mark Lorenz,et al.  Object-oriented software metrics - a practical guide , 1994 .

[25]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[26]  Jens Grabowski,et al.  TTCN-3 Quality Engineering: Using Learning Techniques to Evaluate Metric Sets , 2007, SDL Forum.

[27]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[28]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[29]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[30]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[31]  Victor R. Basili,et al.  Calculation and use of an environment's characteristic software metric set , 1985, ICSE '85.

[32]  Khaled El Emam,et al.  Thresholds for object-oriented measures , 2000, Proceedings 11th International Symposium on Software Reliability Engineering. ISSRE 2000.

[33]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[34]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[35]  Fernando Brito e Abreu,et al.  Object-Oriented Software Engineering: Measuring and Controlling the Development Process , 1994 .

[36]  Joost Visser,et al.  A Practical Model for Measuring Maintainability , 2007, 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007).

[37]  Victor R. Basili,et al.  The TAME Project: Towards Improvement-Oriented Software Environments , 1988, IEEE Trans. Software Eng..

[38]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[39]  Douglas C. Schmidt,et al.  Metric-driven analysis and feedback systems for enabling empirically guided software development , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[40]  Gregor Kiczales,et al.  Aspect-oriented programming , 1996, CSUR.

[41]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[42]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[43]  Taghi M. Khoshgoftaar,et al.  Improving usefulness of software quality classification models based on Boolean discriminant functions , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[44]  Norman F. Schneidewind,et al.  Software quality control and prediction model for maintenance , 2000, Ann. Softw. Eng..

[45]  I. Kononenko,et al.  INDUCTION OF DECISION TREES USING RELIEFF , 1995 .

[46]  John W. Daly,et al.  Evaluating inheritance depth on the maintainability of object-oriented software , 2004, Empirical Software Engineering.

[47]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[48]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[49]  Victor R. Basili,et al.  A Methodology for Collecting Valid Software Engineering Data , 1984, IEEE Transactions on Software Engineering.

[50]  Tony Rosqvist,et al.  Software Quality Evaluation Based on Expert Judgement , 2003, Software Quality Journal.