Metrics-based Detection of Similar Software

This paper presents a quantitative approach to identify similarity among object-oriented systems. This approach has three major contributions: a) a mechanism to derive thresholds for a specific metric, considering different class profiles; b) a mechanism to obtain a subset of similar systems from a portfolio of systems according to one software metric and using wellknown clustering techniques; and c) a mechanism to obtain subsets of similar systems according to a set of software metrics and using concepts of graph theory. In this paper, we also present a tool that supports our approach, called SQComp. Using SQComp, we evaluated our approach in a corpus of 103 opensource systems, comprising more than 16 MLOC. As a result, we were able to found several groups of systems with strong indications of similarity.

[1]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[2]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.

[3]  Marco Tulio Valente,et al.  Uncovering Causal Relationships between Software Metrics and Bugs , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[4]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[5]  Stéphane Tufféry,et al.  Data Mining and Statistics for Decision Making: Tufféry/Data Mining and Statistics for Decision Making , 2011 .

[6]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[7]  Marco Tulio Valente,et al.  Static correspondence and correlation between field defects and warnings reported by a bug finding tool , 2011, Software Quality Journal.

[8]  S. Foss,et al.  An Introduction to Heavy-Tailed and Subexponential Distributions , 2011 .

[9]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[10]  Marco Tulio Valente,et al.  Study on the relevance of the warnings reported by Java bug-finding tools , 2011, IET Softw..

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[12]  Ian H. Witten,et al.  Can We Avoid High Coupling? , 2011, ECOOP.

[13]  Roberto da Silva Bigonha,et al.  Identifying thresholds for object-oriented software metrics , 2012, J. Syst. Softw..

[14]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.