Collecting and leveraging a benchmark of build system clones to aid in quality assessments

Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.

[1]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.

[2]  Mikhail Dmitriev Language-specific make technology for the Java programming language , 2002, OOPSLA '02.

[3]  Ying Zou,et al.  An empirical study of build system migrations in practice: Case studies on KDE and the Linux kernel , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[4]  Elmar Jürgens Research in cloning beyond code: a first roadmap , 2011, IWSC '11.

[5]  Wolfgang De Meuter,et al.  The Evolution of the Linux Build System , 2007, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[6]  Juan Julián Merelo Guervós,et al.  Beyond source code: The importance of other artifacts in software development (a case study) , 2006, J. Syst. Softw..

[7]  Yang Jiao,et al.  The Cost of the Build Tax in Scientific Software , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[8]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[9]  Premkumar T. Devanbu,et al.  Clones: what is that smell? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[10]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Audris Mockus,et al.  Amassing and indexing a large sample of version control systems: Towards the census of public source code history , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[12]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[13]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[14]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[15]  Peter Smith Software Build Systems: Principles and Experience , 2011 .

[16]  Michael W. Godfrey,et al.  Build system issues in multilanguage software , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[17]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Peter Miller Recursive Make Considered Harmful , 2008 .

[19]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[20]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[21]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[22]  Shane McIntosh,et al.  An empirical study of build maintenance effort , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[23]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[24]  Audris Mockus,et al.  Software Support Tools and Experimental Work , 2006, Empirical Software Engineering Issues.

[25]  Elmar Jürgens,et al.  Can clone detection support test comprehension? , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[26]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[27]  Shane McIntosh,et al.  The evolution of Java build systems , 2012, Empirical Software Engineering.

[28]  Bernhard Schätz,et al.  Can clone detection support quality assessments of requirements specifications? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[29]  Elmar Jürgens,et al.  Tool Support for Continuous Quality Control , 2008, IEEE Software.

[30]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[31]  Daniel B. Carr,et al.  Scatterplot matrix techniques for large N , 1986 .