Multilingual Detection of Code Clones Using ANTLR Grammar Definitions

So far, many tools have been developed for the detection of code clones in source code. The existing clone detection tools support only a limited number of programming languages and do not provide any easy extension mechanism to handle additional language. However, from our experience in industry/university collaboration, we found that many practitioners need to analyze source code written in various languages. In this paper, we propose an approach for the multilingual detection of code clones using grammar files for a parser generator ANTLR. We extended a clone detection tool CCFinderSW with the proposed approach and then apply the extended CCFinderSW to ANTLR grammar files for 43 languages. As a result, the files for 39 out of the 43 languages can be analyzed correctly by the extended CCFinderSW.

[1]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[2]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[3]  Katsuro Inoue,et al.  CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[4]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[5]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Hironori Washizaki,et al.  OCCF: A Framework for Developing Test Coverage Measurement Tools Supporting Multiple Programming Languages , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[7]  Katsuro Inoue,et al.  Applying clone change notification system into an industrial development process , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[8]  Shinji Kusumoto,et al.  Experience of finding inconsistently-changed bugs in code clones of mobile software , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[9]  Andreas Zeller,et al.  Why Programs Fail, Second Edition: A Guide to Systematic Debugging , 2009 .

[10]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[11]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.