Detecting Java Code Clones Based on Bytecode Sequence Alignment

When the source code is copied and pasted or modified, there will be a lot of identical or similar code snippets in the software system, which are called code clones. Because code clones are believed to result in undesirable maintainability of software, numerous approaches and techniques have been proposed for code clone detection. However, most of them are based on the source code, while only a few employ the bytecode to detect code clones. In this paper, we introduce an approach based on Java bytecode, which mainly contains the steps of bytecode sequence alignment and similarity score comparison. In particular, we apply the Smith–Waterman algorithm to align bytecode sequences for precise matching. Moreover, we separately consider the similarities between instruction sequences and method call sequences, thus improving its effectiveness in detecting code clones. We conducted an extensive experiment on five open-source software to evaluate the proposed approach. The results show that our approach outperforms other state-of-the-art techniques.

[1]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[2]  Shinji Kusumoto,et al.  Gapped code clone detection with lightweight source code analysis , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[3]  Chanchal Kumar Roy,et al.  SeByte: Scalable clone and similarity search for bytecode , 2014, Sci. Comput. Program..

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[6]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Chanchal Kumar Roy,et al.  Java bytecode clone detection via relaxation on code fingerprint and Semantic Web reasoning , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[8]  Jie Wang,et al.  Detection of Code Clone Based on Source Fragment Alignment , 2017 .

[9]  Antonella Santone,et al.  CD-Form: A clone detector based on formal methods , 2014, Sci. Comput. Program..

[10]  Jugal Kalita,et al.  A Survey of Software Clone Detection Techniques , 2016 .

[11]  Paramvir Singh,et al.  Enhancing program dependency graph based clone detection using approximate subgraph matching , 2017, 2017 IEEE 11th International Workshop on Software Clones (IWSC).

[12]  Michael W. Godfrey,et al.  From Whence It Came: Detecting Source Code Clones by Analyzing Assembler , 2010, 2010 17th Working Conference on Reverse Engineering.

[13]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[14]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[15]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[16]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[17]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[18]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[19]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[20]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[21]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[22]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[23]  Jie Wang,et al.  Detecting Java Code Clones with Multi-granularities Based on Bytecode , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[24]  Udi Manber,et al.  Deducing Similarities in Java Sources from Bytecodes , 1998, USENIX Annual Technical Conference.