CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization

So far, many tools have been developed for the detection of code clones in source code. The existing clone detection tools support only a limited number of programming languages and do not provide any easy extension mechanism to handle additional language. However, from our experience in industry/university collaboration, we found that many practitioners need to analyze source code written in various languages. In this paper, we propose a clone detection tool CCFinderSW that has extension mechanism to handle addition language on demand from practitioners.

[1]  James R. Cordy,et al.  The TXL source transformation language , 2006, Sci. Comput. Program..

[2]  Simone Livieri,et al.  A needle in the stack: efficient clone detection for huge collections of source code , 2010 .

[3]  Rabe Abdalkareem,et al.  On code reuse from StackOverflow: An exploratory study on Android apps , 2017, Inf. Softw. Technol..

[4]  Chanchal Kumar Roy,et al.  Evaluating Modern Clone Detection Tools , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[5]  Gang Zhao,et al.  DeepSim: deep learning code functional similarity , 2018, ESEC/SIGSOFT FSE.

[6]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Hironori Washizaki,et al.  OCCF: A Framework for Developing Test Coverage Measurement Tools Supporting Multiple Programming Languages , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[8]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[9]  Katsuro Inoue,et al.  Industrial application of clone change management system , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[10]  Carlo A. Furia,et al.  A Comparative Study of Programming Languages in Rosetta Code , 2014, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Terence Parr,et al.  The Definitive ANTLR 4 Reference , 2013 .

[12]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[13]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[14]  Hajimu Iida,et al.  Detecting and analyzing code clones in HDL , 2017, 2017 IEEE 11th International Workshop on Software Clones (IWSC).

[15]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[16]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[17]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[18]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[19]  Katsuro Inoue,et al.  Multilingual Detection of Code Clones Using ANTLR Grammar Definitions , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[20]  Cristina V. Lopes,et al.  Oreo: detection of clones in the twilight zone , 2018, ESEC/SIGSOFT FSE.

[21]  Shinji Kusumoto,et al.  Experience of finding inconsistently-changed bugs in code clones of mobile software , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[22]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .