Code similarity detection through control statement and program features

Abstract Software clone detection is an emerging research area in the field of software engineering. Software systems are subjected to continuous modifications in source code to improve the performance of the software, which may lead to code redundancy. Duplicate code/code clone is a piece of code reworked several times in software programs due to copy paste activity or reusability of existing software. Code clone is a prime subject in software evolution. Detection of software clones at the time of software evolution may improve the performance of software and reduce the maintenance cost and effort. This paper proposes metric based methods to detect code clones, as software clone is a universal problem in large scale programming environment. This paper introduces two metric based approaches to detect code clones by comparing (i) Control Statement Features (ii) Program Features like different types of statements, operators and operands. In order to demonstrate the effectiveness of the proposed approaches, extensive experiments are conducted on two datasets, C projects of Bellon's benchmark dataset and student lab programs (SLP).The methods efficiently identify similar functional clones. Proposed models only find similarity of whole programs but intelligent enough to highlight similar code segments across program files.

[1]  Cristina V. Lopes,et al.  A Comparative Study of Bug Patterns in Java Cloned and Non-cloned Code , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[2]  Matthias Rieger,et al.  Effective Clone Detection Without Language Barriers , 2005 .

[3]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[4]  J. Howard Johnson,et al.  Substring matching for clone detection and change tracking , 1994, Proceedings 1994 International Conference on Software Maintenance.

[5]  Lalitha Rangarajan,et al.  Structural Similarity Detection Using Structure of Control Statements , 2015 .

[6]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[7]  Erhard Plödereder,et al.  Bauhaus - A Tool Suite for Program Analysis and Reverse Engineering , 2006, Ada-Europe.

[8]  Seunghak Lee,et al.  SDD: high performance code clone detection system for large scale source code , 2005, OOPSLA '05.

[9]  Jugal K. Kalita,et al.  Expert Systems With Applications , 2022 .

[10]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[11]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[12]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[13]  Radu Marinescu,et al.  Archeology of code duplication: recovering duplication chains from small duplication fragments , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).

[14]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[15]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[16]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[17]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[18]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[19]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[20]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[21]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[22]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[23]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[24]  E Kodhai.,et al.  Clone Detection using Textual and Metric Analysis to figure out all Types of Clones , 2010 .

[25]  Lalitha Rangarajan,et al.  Code clone detection based on order and content of control statements , 2016, 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

[26]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[27]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[28]  Premkumar T. Devanbu,et al.  Clones: what is that smell? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[29]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[30]  Jeffrey G. Gray,et al.  Phoenix-based clone detection using suffix trees , 2006, ACM-SE 44.

[31]  Jean-Francois Girard,et al.  An intermediate representation for integrating reverse engineering analyses , 1998, Proceedings Fifth Working Conference on Reverse Engineering (Cat. No.98TB100261).

[32]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[33]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[34]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[35]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[36]  Susan Horwitz,et al.  Effective, automatic procedure extraction , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..