Cloning practices: Why developers clone and what can be changed

Code clones are similar code segments. Researchers have proposed many techniques to detect, understand and eliminate code clones. However, due to lack of deeper understanding of reasons of cloning practices, especially from personal and organizational perspectives, little effective support can be provided to alleviate maintenance problems caused by code clones. In this paper, we report an industrial study on investigating reasons of cloning practices in large-scale software development from technical, personal, and organizational perspectives. Our study involves code analysis, questionnaire survey, and interviews with developers, and gathers solid empirical data about how developers clone and why during different phases of clones' lifecycle in industrial development. The results of our study suggest that cloning is not simply a technical issue; it must be interpreted and understood in larger context in which code clones occur and evolve. Within these contexts, there are several adjustable factors and two critical points that affect the introduction, existence, and removal of clones. These adjustable factors and critical points reveal opportunities to improve cloning practices in industrial development from technical, personal, and organizational perspectives.

[1]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[2]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[3]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[4]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[5]  Norman Bresky Root Cause Analysis: Simplified Tools and Techniques , 2007, Technometrics.

[6]  Bjørn Andersen,et al.  Root Cause Analysis: Simplified Tools and Techniques , 1999 .

[7]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004 .

[8]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[9]  Yijun Yu,et al.  Maintaining invariant traceability through bidirectional transformations , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10]  Stan Jarzabek,et al.  Query-based filtering and graphical view generation for clone analysis , 2008, 2008 IEEE International Conference on Software Maintenance.

[11]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[12]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[13]  Manishankar Mondal,et al.  Comparative stability of cloned and non-cloned code: an empirical study , 2012, SAC '12.

[14]  Chanchal Kumar Roy,et al.  Near-miss function clones in open source software : an empirical study , 2009 .

[15]  Shinji Kusumoto,et al.  ARIES: Refactoring support environment based on code clone analysis , 2004, IASTED Conf. on Software Engineering and Applications.

[16]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[17]  Peter M. Chisnall,et al.  Questionnaire Design, Interviewing and Attitude Measurement , 1993 .

[18]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[19]  Matthias Rieger,et al.  Effective Clone Detection Without Language Barriers , 2005 .

[20]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[21]  Michael W. Godfrey,et al.  A Study of Cloning in the Linux SCSI Drivers , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[22]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[23]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[24]  Miryung Kim,et al.  An Empirical Study of Long-Lived Code Clones , 2011, FASE.

[25]  Michael W. Godfrey,et al.  Cloning by accident: an empirical study of source code cloning across software systems , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[26]  Stan Jarzabek,et al.  A Data Mining Approach for Detecting Higher-Level Clones in Software , 2009, IEEE Transactions on Software Engineering.

[27]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.