The vision of software clone management: Past, present, and future (Keynote paper)

Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system. We believe that we have done a good job in surveying the area of clone management and that this work may serve as a roadmap for future research in the area.

[1]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[2]  Stan Jarzabek,et al.  Unifying clones with a generative programming technique: a case study , 2006, J. Softw. Maintenance Res. Pract..

[3]  Chanchal Kumar Roy,et al.  On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems , 2011, 2011 18th Working Conference on Reverse Engineering.

[4]  Stan Jarzabek,et al.  An Empirical Study on Limits of Clone Unification Using Generics , 2005, SEKE.

[5]  Stan Jarzabek,et al.  Query-based filtering and graphical view generation for clone analysis , 2008, 2008 IEEE International Conference on Software Maintenance.

[6]  Ferosh Jacob,et al.  Exploring the design space of proactive tool support for copy-and-paste programming , 2009, CASCON.

[7]  Rainer Koschke,et al.  Software Clone Management Towards Industrial Application (Dagstuhl Seminar 12071) , 2012, Dagstuhl Reports.

[8]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[9]  Chanchal Kumar Roy,et al.  IDE-based real-time focused search for near-miss clones , 2012, SAC '12.

[10]  Iman Keivanloo,et al.  Doppel-Code: A Clone Visualization Tool for Prioritizing Global and Local Clone Impacts , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[11]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[12]  Nicholas A. Kraft,et al.  Clone evolution: a systematic review , 2011, J. Softw. Evol. Process..

[13]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[14]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[15]  Jeffrey G. Gray,et al.  Phoenix-based clone detection using suffix trees , 2006, ACM-SE 44.

[16]  James R. Cordy Live scatterplots , 2011, IWSC '11.

[17]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[18]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[19]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[20]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[21]  Giuliano Antoniol,et al.  A novel approach to optimize clone refactoring activity , 2006, GECCO.

[22]  Nikolaos Tsantalis,et al.  Unification and refactoring of clones , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[23]  Chanchal Kumar Roy,et al.  An automatic framework for extracting and classifying near-miss clone genealogies , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[24]  Chanchal Kumar Roy,et al.  Scaling classical clone detection tools for ultra-large datasets: An exploratory study , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[25]  Foutse Khomh,et al.  An empirical study of the fault-proneness of clone mutation and clone migration , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[26]  Simon Giesecke,et al.  Generic modelling of code clones , 2006, Duplication, Redundancy, and Similarity in Software.

[27]  Bernhard Schätz,et al.  Can clone detection support quality assessments of requirements specifications? , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[29]  Walter Binder,et al.  Reengineering Standard Java Runtime Systems through Dynamic Bytecode Instrumentation , 2007 .

[30]  Chanchal Kumar Roy,et al.  Conflict-Aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[31]  Ahmed E. Hassan,et al.  A Framework for Studying Clones In Large Software Systems , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[32]  Chanchal Kumar Roy,et al.  gCad: A Near-Miss Clone Genealogy Extractor to Support Clone Evolution Analysis , 2013, 2013 IEEE International Conference on Software Maintenance.

[33]  Hajimu Iida,et al.  SHINOBI: A Tool for Automatic Code Clone Detection in the IDE , 2009, 2009 16th Working Conference on Reverse Engineering.

[34]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[35]  W. Marsden I and J , 2012 .

[36]  Jeffrey G. Gray,et al.  Get to know your clones with CeDAR , 2009, OOPSLA Companion.

[37]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[38]  Shinji Kusumoto,et al.  Refactoring Support Based on Code Clone Analysis , 2004, PROFES.

[39]  Hoan Anh Nguyen,et al.  Scalable and incremental clone detection for evolving software , 2009, 2009 IEEE International Conference on Software Maintenance.

[40]  Harald Störrle Towards clone detection in UML domain models , 2010, ECSA '10.

[41]  Chanchal Kumar Roy,et al.  On the relationships between domain-based coupling and code clones: An exploratory study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[42]  Seunghak Lee,et al.  SDD: high performance code clone detection system for large scale source code , 2005, OOPSLA '05.

[43]  Chanchal Kumar Roy,et al.  Conflict-aware optimal scheduling of prioritised code clone refactoring , 2013, IET Softw..

[44]  Nils Göde,et al.  Quo vadis, clone management? , 2010, IWSC '10.

[45]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[46]  J. Howard Johnson,et al.  Navigating the textual redundancy web in legacy source , 1996, CASCON.

[47]  Chanchal Kumar Roy,et al.  Detecting Clones Across Microsoft .NET Programming Languages , 2012, 2012 19th Working Conference on Reverse Engineering.

[48]  Elmar Jürgens,et al.  CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[49]  Chanchal Kumar Roy,et al.  Evaluating the conventional wisdom in clone removal: a genealogy-based empirical study , 2013, SAC '13.

[50]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[51]  Shinji Kusumoto,et al.  Incremental Code Clone Detection: A PDG-based Approach , 2011, 2011 18th Working Conference on Reverse Engineering.

[52]  Michael W. Godfrey,et al.  Improved tool support for the investigation of duplication in software , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[53]  AntoniolGiulio,et al.  Comparison and Evaluation of Clone Detection Tools , 2007 .

[54]  Zhiyi Ma,et al.  Detecting Duplications in Sequence Diagrams Based on Suffix Trees , 2006, 2006 13th Asia Pacific Software Engineering Conference (APSEC'06).

[55]  Maziar Gomrokchi,et al.  Source code enhancement using reduction of duplicated code , 2007 .

[56]  Jeffrey G. Gray,et al.  Increasing clone maintenance support by unifying clone detection and refactoring activities , 2012, Inf. Softw. Technol..

[57]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[58]  Chanchal Kumar Roy,et al.  A Constraint Programming Approach to Conflict-Aware Optimal Scheduling of Prioritized Code Clone Refactoring , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[59]  Doo-Hwan Bae,et al.  Automated scheduling for clone‐based refactoring using a competent GA , 2011, Softw. Pract. Exp..

[60]  Zhenchang Xing,et al.  Cloning practices: Why developers clone and what can be changed , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[61]  Elmar Jürgens Research in cloning beyond code: a first roadmap , 2011, IWSC '11.

[62]  Manishankar Mondal,et al.  An insight into the dispersion of changes in cloned and non-cloned code: A genealogy based empirical study , 2014, Sci. Comput. Program..

[63]  Jan Harder,et al.  A common conceptual model for clone detection results , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[64]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[65]  Hoan Anh Nguyen,et al.  Clone-Aware Configuration Management , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[66]  Jeffrey C. Carver,et al.  Effects of cloned code on software maintainability: A replicated developer study , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[67]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[68]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[69]  Jeffrey C. Carver,et al.  Cloning: The need to understand developer intent , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[70]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[71]  Foutse Khomh,et al.  An empirical study on the fault-proneness of clone migration in clone genealogies , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[72]  Miryung Kim,et al.  SoftGUESS: Visualization and Exploration of Code Clones in Context , 2007, 29th International Conference on Software Engineering (ICSE'07).

[73]  Henryk Sienkiewicz,et al.  Quo Vadis? , 1967, American Association of Industrial Nurses journal.

[74]  Daqing Hou,et al.  CReN: a tool for tracking copy-and-paste code clones and renaming identifiers consistently in the IDE , 2007, eclipse '07.

[75]  Lu Zhang,et al.  Can I clone this piece of code here? , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[76]  Тараса Шевченка,et al.  Quo vadis? , 2013, Clinical chemistry.

[77]  Hoan Anh Nguyen,et al.  Clone Management for Evolving Software , 2012, IEEE Transactions on Software Engineering.

[78]  Michael W. Godfrey,et al.  Clone detection by exploiting assembler , 2010, IWSC '10.

[79]  Chanchal Kumar Roy,et al.  Visualizing the evolution of code clones , 2011, IWSC '11.

[80]  Jeffrey C. Carver,et al.  Claims and beliefs about code clones: Do we agree as a community? A survey , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[81]  Nils Göde,et al.  Cloned code: stable code , 2013, J. Softw. Evol. Process..

[82]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[83]  Hoan Anh Nguyen,et al.  Cleman: Comprehensive Clone Group Evolution Management , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[84]  Zhenchang Xing,et al.  Distilling useful clones by contextual differencing , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[85]  J. Howard Johnson,et al.  Substring matching for clone detection and change tracking , 1994, Proceedings 1994 International Conference on Software Maintenance.

[86]  E. Juergens,et al.  How Much is a Clone ? , 2010 .

[87]  Chanchal Kumar Roy,et al.  Towards flexible code clone detection, management, and refactoring in IDE , 2011, IWSC '11.

[88]  James R. Cordy,et al.  Exploring Large-Scale System Similarity Using Incremental Clone Detection and Live Scatterplots , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[89]  Richard C. Holt,et al.  Visualizing Clone Cohesion and Coupling , 2006, 2006 13th Asia Pacific Software Engineering Conference (APSEC'06).

[90]  D. V. Radhika,et al.  Prioritizing code clone detection results for clone management , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[91]  Manishankar Mondal,et al.  Automatic ranking of clones for refactoring through mining association rules , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[92]  Cory Kapser,et al.  Toward an Understanding of Software Code Cloning as a Development Practice , 2009 .

[93]  Marko Rosenmüller,et al.  Towards a refactoring guideline using code clone classification , 2008, WRT '08.

[94]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[95]  Hajimu Iida,et al.  Code Clone Graph Metrics for Detecting Diffused Code Clones , 2009, 2009 16th Asia-Pacific Software Engineering Conference.

[96]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[97]  Elmar Jürgens,et al.  Index-based code clone detection: incremental, distributed, scalable , 2010, 2010 IEEE International Conference on Software Maintenance.

[98]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[99]  Chanchal Kumar Roy,et al.  A mutation analysis based benchmarking framework for clone detectors , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[100]  Muhammed Yasin Bahtiyar JClone: Syntax tree based clone detection for Java , 2010 .

[101]  Martin P. Robillard,et al.  Clone region descriptors: Representing and tracking duplication in source code , 2010, TSEM.

[102]  Arie van Deursen,et al.  Data clone detection and visualization in spreadsheets , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[103]  Heejung Kim,et al.  MeCC: memory comparison-based clone detector , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[104]  Nils Göde,et al.  Efficiently handling clone data: RCF and cyclone , 2011, IWSC '11.

[105]  Jan Harder The limits of clone model standardization , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[106]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[107]  Zhenchang Xing,et al.  Towards contextual and on-demand code clone management by continuous monitoring , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[108]  Foutse Khomh,et al.  An empirical study of faults in late propagation clone genealogies , 2013, J. Softw. Evol. Process..

[109]  Hoan Anh Nguyen,et al.  Complete and accurate clone detection in graph-based models , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[110]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[111]  Manishankar Mondal,et al.  Comparative stability of cloned and non-cloned code: an empirical study , 2012, SAC '12.

[112]  Katsuro Inoue,et al.  Applying clone change notification system into an industrial development process , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[113]  Shinji Kusumoto,et al.  Enhancing Quality of Code Clone Detection with Program Dependency Graph , 2009, 2009 16th Working Conference on Reverse Engineering.

[114]  Manishankar Mondal,et al.  An empirical study on clone stability , 2012, SIAP.

[115]  J. Howard Johnson,et al.  Visualizing textual redundancy in legacy source , 1994, CASCON.

[116]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems , 2006, J. Softw. Maintenance Res. Pract..

[117]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[118]  Sandro Schulze,et al.  Advanced Analysis for Code Clone Removal , 2009, Softwaretechnik-Trends.

[119]  Shinji Kusumoto,et al.  Gemini: maintenance support environment based on code clone analysis , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[120]  Chanchal Kumar Roy,et al.  Evaluating Code Clone Genealogies at Release Level: An Empirical Study , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[121]  Chanchal Kumar Roy,et al.  Understanding the evolution of Type-3 clones: An exploratory study , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[122]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[123]  Jeffrey G. Gray,et al.  Visualizing clone detection results , 2007, ASE.

[124]  Romain Robbes,et al.  On how often code is cloned across repositories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[125]  Rainer Koschke,et al.  Studying clone evolution using incremental clone detection , 2013, J. Softw. Evol. Process..

[126]  Antonella Santone,et al.  Clone detection through process algebras and Java bytecode , 2011, IWSC '11.

[127]  James R. Cordy,et al.  Practical language-independent detection of near-miss clones , 2004, CASCON.

[128]  Shinji Kusumoto,et al.  On refactoring support based on code clone dependency relation , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[129]  Chanchal Kumar Roy,et al.  SeByte: Scalable clone and similarity search for bytecode , 2014, Sci. Comput. Program..

[130]  Lingxiao Jiang,et al.  Deckard - a tree-based, scalable, and accurate code clone detection tool (version 1.3.1) , 2011 .

[131]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[132]  James R. Cordy,et al.  Comprehending reality - practical barriers to industrial adoption of software maintenance automation , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[133]  Stan Jarzabek,et al.  Detecting higher-level similarity patterns in programs , 2005, ESEC/FSE-13.

[134]  Rainer Koschke,et al.  An Empirical Study of Clone Removals , 2013, 2013 IEEE International Conference on Software Maintenance.

[135]  Chanchal Kumar Roy,et al.  VisCad: flexible code clone analysis support for NiCad , 2011, IWSC '11.

[136]  Damith C. Rajapakse,et al.  Beyond templates: a study of clones in the STL and some general implications , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[137]  Ferosh Jacob,et al.  CnP: Towards an environment for the proactive management of copy-and-paste programming , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[138]  Arie van Deursen,et al.  Managing code clones using dynamic change tracking and resolution , 2009, 2009 IEEE International Conference on Software Maintenance.

[139]  Michael W. Godfrey,et al.  Aiding comprehension of cloning through categorization , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[140]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[141]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[142]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[143]  Shinji Kusumoto,et al.  Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software , 2010, IWPSE-EVOL '10.

[144]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[145]  Chanchal K. Roy,et al.  Analyzing and Forecasting Near-Miss Clones in Evolving Software: An Empirical Study , 2011, 2011 16th IEEE International Conference on Engineering of Complex Computer Systems.

[146]  Joshua Kerievsky,et al.  Refactoring to Patterns , 2004, XP/Agile Universe.