Classifying Obstructive and Nonobstructive Code Clones of Type I Using Simplified Classification Scheme: A Case Study

Code cloning is a part of many commercial and open source development products. Multiple methods for detecting code clones have been developed and finding the clones is often used in modern quality assurance tools in industry. There is no consensus whether the detected clones are negative for the product and therefore the detected clones are often left unmanaged in the product code base. In this paper we investigate how obstructive code clones of Type I (duplicated exact code fragments) are in large software systems from the perspective of the quality of the product after the release. We conduct a case study at Ericsson and three of its large products, which handle mobile data traffic. We show how to use automated analogy-based classification to decrease the classification effort required to determine whether a clone pair should be refactored or remain untouched. The automated method allows classifying 96% of Type I clones (both algorithms and data declarations) leaving the remaining 4% for the manual classification. The results show that cloning is common in the studied commercial software, but that only 1% of these clones are potentially obstructive and can jeopardize the quality of the product if left unmanaged.

[1]  Sebastian G. Elbaum,et al.  Quality assurance under the open source development model , 2003, J. Syst. Softw..

[2]  Martin P. Robillard,et al.  Clonetracker: tool support for code clone management , 2008, ICSE '08.

[3]  Huiqing Li,et al.  Incremental Clone Detection and Elimination for Erlang Programs , 2011, FASE.

[4]  Ying Zou,et al.  Studying the Impact of Clones on Software Defects , 2010, 2010 17th Working Conference on Reverse Engineering.

[5]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[6]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems: Research Articles , 2006 .

[7]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[8]  Huiqing Li,et al.  Clone detection and removal for Erlang/OTP within a refactoring environment , 2009, PEPM '09.

[9]  Miroslaw Staron,et al.  Identifying Implicit Architectural Dependencies Using Measures of Source Code Change Waves , 2013, 2013 39th Euromicro Conference on Software Engineering and Advanced Applications.

[10]  Elmar Jürgens,et al.  Index-based code clone detection: incremental, distributed, scalable , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Miroslaw Staron,et al.  Using Models to Develop Measurement Systems: A Method and Its Industrial Use , 2009, IWSM/Mensura.

[12]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[13]  Nils Göde,et al.  Evolution of Type-1 Clones , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[14]  Nicholas A. Kraft,et al.  Clone evolution: a systematic review , 2011, J. Softw. Evol. Process..

[15]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems , 2006, J. Softw. Maintenance Res. Pract..

[16]  Cory Kapser,et al.  Toward an Understanding of Software Code Cloning as a Development Practice , 2009 .

[17]  Chanchal K. Roy,et al.  Recommending change clusters to support software investigation: an empirical study , 2010 .

[18]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[19]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[20]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[21]  Miroslaw Staron,et al.  An Industrial Case Study on the Choice Between Language Customization Mechanisms , 2006, PROFES.

[22]  Helen Sharp,et al.  Models of motivation in software engineering , 2009, Inf. Softw. Technol..

[23]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[24]  Alexander L. Wolf,et al.  Acm Sigsoft Software Engineering Notes Vol 17 No 4 Foundations for the Study of Software Architecture , 2022 .

[25]  Huiqing Li,et al.  Similar Code Detection and Elimination for Erlang Programs , 2010, PADL.

[26]  C. Robson,et al.  Real World Research: A Resource for Social Scientists and Practitioner-Researchers , 1993 .

[27]  Shinji Kusumoto,et al.  A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system , 2008 .

[28]  Chanchal Kumar Roy,et al.  Evaluating the conventional wisdom in clone removal: a genealogy-based empirical study , 2013, SAC '13.

[29]  Michael W. Godfrey,et al.  Improved tool support for the investigation of duplication in software , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[30]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[31]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[32]  Dongmei Zhang,et al.  Code clone detection experience at microsoft , 2011, IWSC '11.

[33]  Frederick Toates,et al.  Models of Motivation , 1983 .

[34]  Miroslaw Staron,et al.  Measuring and Visualizing Code Stability -- A Case Study at Three Companies , 2013, 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement.

[35]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[36]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[37]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[38]  Simon J. Thompson,et al.  Clone detection and elimination for Haskell , 2010, PEPM '10.

[39]  Katsuro Inoue,et al.  Applying clone change notification system into an industrial development process , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[40]  Miroslaw Staron,et al.  Release Readiness Indicator for Mature Agile and Lean Software Development Projects , 2012, XP.

[41]  Eric Richardson What an Agile Architect Can Learn from a Hurricane Meteorologist , 2011, IEEE Software.

[42]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[43]  Shinji Kusumoto,et al.  Gemini: maintenance support environment based on code clone analysis , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[44]  Miroslaw Staron,et al.  A framework for developing measurement systems and its industrial evaluation , 2009, Inf. Softw. Technol..

[45]  Patrik Berander,et al.  From Traditional to Streamline Development — opportunities and challenges , 2008 .

[46]  Miroslaw Staron,et al.  Identifying risky areas of software code in Agile/Lean software development: An industrial experience report , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[47]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[48]  Miroslaw Staron,et al.  Supporting Software Decision Meetings: Heatmaps for Visualising Test and Code Measurements , 2013, 2013 39th Euromicro Conference on Software Engineering and Advanced Applications.

[49]  Miroslaw Staron,et al.  A Light-Weight Defect Classification Scheme for Embedded Automotive Software and Its Initial Evaluation , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[50]  Scott Grant,et al.  Vector space analysis of software clones , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[51]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Akito Monden,et al.  Software quality analysis by code clones in industrial legacy software , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[53]  Miroslaw Staron,et al.  Monitoring Evolution of Code Complexity in Agile/Lean Software Development - A Case Study at Two Companies , 2013 .

[54]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[55]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[56]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[57]  Michael W. Godfrey,et al.  Aiding comprehension of cloning through categorization , 2004 .

[58]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[59]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[60]  Miroslaw Staron,et al.  Developing measurement systems: an industrial case study , 2011, J. Softw. Maintenance Res. Pract..

[61]  R. Radhika,et al.  Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics , 2010, 2010 International Conference on Recent Trends in Information, Telecommunication and Computing.

[62]  Miroslaw Staron,et al.  Monitoring Bottlenecks in Agile and Lean Software Development Projects - A Method and Its Industrial Use , 2011, PROFES.