Software Documents: Comparison and Measurement

For some time now, researchers have been seeking to place software measurement on a more firmly grounded footing by establishing a theoretical basis for software comparison. Although there has been some work on trying to employ information theoretic concepts for the quantification of code documents, particularly on employing entropy and entropy-like measurements, we propose that employing the Similarity Metric of Li, Vitanyi, and coworkers for the comparison of software documents will lead to the establishment of a theoretically justifiable means of comparing and evaluating software artifacts. In this paper, we review previous work on software measurement with a particular emphasis on information theoretic aspects, we examine the body of work on Kolmogorov complexity (upon which the Similarity Metric is based), and we report on some experiments that lend credence to our proposals. Finally, we discuss the potential advantages derived from the application of this theory to areas in the field of software engineering.

[1]  Manfred Broy,et al.  Interaction interfaces-towards a scientific foundation of a methodological usage of message sequence charts , 1998, Proceedings Second International Conference on Formal Engineering Methods (Cat.No.98EX241).

[2]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[3]  Mark Lorenz,et al.  Object-oriented software metrics - a practical guide , 1994 .

[4]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[5]  Taghi M. Khoshgoftaar,et al.  Are the principal components of software complexity data stable across software products? , 1994, Proceedings of 1994 IEEE 2nd International Software Metrics Symposium.

[6]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[7]  Norman E. Fenton,et al.  When a software measure is not a measure , 1992, Softw. Eng. J..

[8]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[9]  David Lorge Parnas,et al.  Module Interface Documentation - Using the Trace Function Method (TFM) , 2006 .

[10]  Neal S. Coulter,et al.  Information-Theoretic Complexity of Program Specifications , 1987, Comput. J..

[11]  Taghi M. Khoshgoftaar,et al.  The dimensionality of program complexity , 1989, ICSE '89.

[12]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[13]  S. J. Prowell Developing black box specifications through sequence enumeration , 1999, Proceedings. Science and Engineering for Software Development: A Recognition of Harlin D. Mills Legacy (Cat. No. PR00010).

[14]  Letha H. Etzkorn,et al.  An Entropy-Based Complexity Measure for Object-Oriented Designs , 1999, Theory Pract. Object Syst..

[15]  David P. Tegarden,et al.  A software complexity model of object-oriented systems , 1995, Decis. Support Syst..

[16]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[17]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[18]  R. Landauer,et al.  Irreversibility and heat generation in the computing process , 1961, IBM J. Res. Dev..

[19]  Sallie M. Henry,et al.  Software Structure Metrics Based on Information Flow , 1981, IEEE Transactions on Software Engineering.

[20]  Kostadin Koroutchev,et al.  Detecting translations of the same text and data with common source , 2006 .

[21]  van M.H. Emden,et al.  An analysis of complexity , 1971 .

[22]  Avinash C. Kak,et al.  API-Based and Information-Theoretic Metrics for Measuring the Quality of Software Modularization , 2007, IEEE Transactions on Software Engineering.

[23]  B. Henderson-Sellers The mathematical validity of software metrics , 1996, SOEN.

[24]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[25]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[26]  Michael Kohlhase,et al.  MathDox : mathematical documents on the web , 2006 .

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Paul J. Layzell,et al.  Spatial complexity metrics: an investigation of utility , 2005, IEEE Transactions on Software Engineering.

[29]  van M.H. Emden On the hierarchical decomposition of complexity , 1969 .

[30]  L. L. CAMPBELL,et al.  Entropy as a measure , 1965, IEEE Trans. Inf. Theory.

[31]  Jesse H. Poore,et al.  Measuring complexity and coverage of software specifications , 2000, Inf. Softw. Technol..

[32]  Péter Gács,et al.  Thermodynamics of computation and information distance , 1993, STOC.

[33]  Paul M. B. Vitányi,et al.  Shannon Information and Kolmogorov Complexity , 2004, ArXiv.

[34]  N. Chapin,et al.  An entropy metric for software maintainability , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume II: Software Track.

[35]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[36]  David Lorge Parnas,et al.  Using traces to write abstract specifications for software modules , 1977 .

[37]  Sandro Morasca,et al.  Property-Based Software Engineering Measurement , 1996, IEEE Trans. Software Eng..

[38]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[39]  Taghi M. Khoshgoftaar,et al.  Measuring coupling and cohesion: an information-theory approach , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[40]  Taghi M. Khoshgoftaar,et al.  Modeling the relationship between source code complexity and maintenance difficulty , 1994, Computer.

[41]  David Lorge Parnas,et al.  Precise description and specification of software , 1998 .

[42]  Mansur H. Samadzadeh,et al.  Software reuse and information theory based metrics , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[43]  Paul M. B. Vitányi,et al.  Kolmogorov Complexity and Information Theory. With an Interpretation in Terms of Questions and Answers , 2003, J. Log. Lang. Inf..

[44]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[45]  Leo Hellerman,et al.  A Measure of Computational Work , 1972, IEEE Transactions on Computers.

[46]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[47]  Taghi M. Khoshgoftaar,et al.  Measuring coupling and cohesion of software modules: an information-theory approach , 2001, Proceedings Seventh International Software Metrics Symposium.

[48]  Paul M. B. Vitányi,et al.  Universal similarity , 2005, IEEE Information Theory Workshop, 2005..

[49]  Sten F. Andler,et al.  Predicate path expressions , 1979, POPL.

[50]  Stacy J. Prowell,et al.  Foundations of Sequence-Based Software Specification , 2003, IEEE Trans. Software Eng..

[51]  Alfonso Ortega,et al.  Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor , 2005, Commun. Inf. Syst..

[52]  Michael Kohlhase,et al.  OMDoc - An Open Markup Format for Mathematical Documents [version 1.2] , 2006, Lecture Notes in Computer Science.

[53]  Elaine J. Weyuker,et al.  Evaluating Software Complexity Measures , 2010, IEEE Trans. Software Eng..

[54]  Norman E. Fenton,et al.  Measurement : A Necessary Scientific Basis , 2004 .

[55]  Dennis K. Peters,et al.  An IDE for software development using tabular expressions , 2007, CASCON.

[56]  B. H. Yin,et al.  The establishment and use of measures to evaluate the quality of software designs , 1978, SIGMETRICS Perform. Evaluation Rev..

[57]  Horst Zuse,et al.  Software complexity: Measures and methods , 1990 .

[58]  R. N. Chanon On a measure of program structure , 1974, Symposium on Programming.

[59]  Raymond J. Rubey,et al.  Quantitative measurement of program quality , 1968, ACM '68.