Measure software - and its evolution - using information content

To be able to examine software evolution - variation in software over a sequence of releases - or to compare differing versions of software with each other, we need to be able to measure artefacts representative of the software or its creation process. One can find in the literature a multitude of approaches to both measuring software - by defining and applying software metrics - and to examining software evolution in terms of these metrics. In this position paper, we claim that information content, specifically the (relative) Kolmogorov complexity, is the correct and fundamental tool for the measurement of software artefacts. Experimental results obtained from an analysis of the project udev demonstrate utility: future work should explore the breadth of applicability and determine the full scope of the approach.

[1]  Herbert A. Simon,et al.  Aggregation of Variables in Dynamic Systems , 1961 .

[2]  Dennis G. Kafura A survey of software metrics , 1985, ACM '85.

[3]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[4]  Christopher Alexander Notes on the Synthesis of Form , 1964 .

[5]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[6]  Taghi M. Khoshgoftaar,et al.  Applications of information theory to software engineering measurement , 1994, Software Quality Journal.

[7]  van M.H. Emden On the hierarchical decomposition of complexity , 1969 .

[8]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[9]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[10]  Edward B. Allen,et al.  Measuring size, complexity, and coupling of hypergraph abstractions of software: An information-theory approach , 2007, Software Quality Journal.

[11]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[12]  Norman E. Fenton,et al.  When a software measure is not a measure , 1992, Softw. Eng. J..

[13]  Tom Arbuckle,et al.  Visually Summarising Software Change , 2008, 2008 12th International Conference Information Visualisation.

[14]  Stacy J. Prowell,et al.  Foundations of Sequence-Based Software Specification , 2003, IEEE Trans. Software Eng..

[15]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[16]  Alfonso Ortega,et al.  Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor , 2005, Commun. Inf. Syst..

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  David Clark,et al.  Quantitative Information Flow, Relations and Polymorphic Types , 2005, J. Log. Comput..

[19]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[20]  Rudi Lutz,et al.  Evolving good hierarchical decompositions of complex systems , 2001, J. Syst. Archit..

[21]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[22]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[23]  Dennis K. Peters,et al.  Software Documents: Comparison and Measurement , 2007, SEKE.

[24]  Mark Harman,et al.  Search Based Software Engineering for Program Comprehension , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[25]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[26]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[27]  Michael Jackson,et al.  The Name and Nature of Software Engineering , 2008, Lipari Summer School.

[28]  Horst Zuse,et al.  Software complexity: Measures and methods , 1990 .

[29]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[30]  R. N. Chanon On a measure of program structure , 1974, Symposium on Programming.

[31]  van M.H. Emden,et al.  An analysis of complexity , 1971 .

[32]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[33]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[34]  Leo Hellerman,et al.  A Measure of Computational Work , 1972, IEEE Transactions on Computers.