Visualizing and Understanding Code Duplication in Large Software Systems

Code duplication, or code cloning, is a common phenomena in the development of large software systems. Developers have a love-hate relationship with cloning. On one hand, cloning speeds up the development process. On the other hand, clone management is a challenging task as software evolves. Cloning has commonly been considered as undesirable for software maintenance and several research efforts have been devoted to automatically detect clones and eliminate clones aggressively. However, there is little empirical work done to analyze the consequences of cloning with respect to the software quality. Recent studies show that cloning is not necessarily undesirable. Cloning can used to minimize risks and there are cases where cloning is used as a design technique. In this thesis, three visualization techniques are proposed to aid researchers in analyzing cloning in studying large software systems. All of the visualizations abstract and display cloning information at the subsystem level but with different emphases. At the subsystem level, clones can be classified as external clones and internal clones. External clones refer to code duplicates that reside in the same subsystem, whereas external clones are clones that are spread across different subsystems. Software architecture quality attributes such as cohesion and coupling are introduced to contribute to the study of cloning at the architecture level. The Clone Cohesion and Coupling (CCC) Graph and the Clone System Hierarchy (CSH) Graph display the cloning information for one single release. In particular, the CCC Graph highlights the amount of internal and external cloning for each subsystems; whereas the CSH Graph focuses more on the details of the spread of cloning. Finally, the Clone System Evolution (CSE) Graph shows the evolution of cloning over a period of time.

[1]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[2]  Giuliano Antoniol,et al.  Investigating large software system evolution: the Linux kernel , 2002, Proceedings 26th Annual International Computer Software and Applications.

[3]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[4]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[5]  Jesús M. González-Barahona,et al.  Evolution and growth in large libre software projects , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[6]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[7]  J. Howard Johnson,et al.  Navigating the textual redundancy web in legacy source , 1996, CASCON.

[8]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[9]  Mary Shaw,et al.  An Introduction to Software Architecture , 1993, Advances in Software Engineering and Knowledge Engineering.

[10]  Damith C. Rajapakse,et al.  Beyond templates: a study of clones in the STL and some general implications , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[11]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[12]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[13]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[14]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[15]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[16]  Václav Rajlich,et al.  Removing clones from the code , 1999, J. Softw. Maintenance Res. Pract..

[17]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[18]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[19]  James R. Cordy Comprehending Reality: Practical Challenges to Software Maintenance Automation , 2003 .

[20]  J. Howard Johnson,et al.  Visualizing textual redundancy in legacy source , 1994, CASCON.

[21]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems , 2006, J. Softw. Maintenance Res. Pract..

[22]  Jonathan Helfman,et al.  Dotplot Patterns: A Literal Look at Pattern Languages , 1996, Theory Pract. Object Syst..

[23]  Arie van Deursen,et al.  An evaluation of clone detection techniques for crosscutting concerns , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[24]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[25]  David A. Wheeler More Than a Gigabuck: Estimating GNU/Linux''s Size , 2002, WWW 2002.

[26]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[27]  S. Kusumoto,et al.  Gemini: Code Clone Analysis Tool , 2002 .

[28]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[29]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[30]  Susan Horwitz,et al.  Eliminating duplication in source code via procedure extraction , 2002 .

[31]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[32]  Akito Monden,et al.  Software quality analysis by code clones in industrial legacy software , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[33]  YangJunfeng,et al.  An empirical study of operating systems errors , 2001 .

[34]  Michael W. Godfrey,et al.  Cloning by accident: an empirical study of source code cloning across software systems , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[35]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.