Improving Code Clone Detection Accuracy and Efficiency based on Code Complexity Analysis

Code cloning is a common activity in software development, which refers to the copying and reusing of code fragments with appropriate modifications. In the era of big code, some code clone detection techniques are no longer suitable for large-scale software or repositories. Code clone filtering can not only significantly improve the efficiency of code clone detection but also effectively improve the accuracy of code clone detection. In this paper, we propose CCFilter which is a code clone filtering tool based on code complexity analysis. In CCFilter, the code complexity of all functions is analyzed, and the functions to be checked are filtered according to the threshold of complexity before detecting code clones. In order to evaluate CCFilter's performance, a set of experiments is conducted. CCFilter is more accurate and efficient than a filtering strategy based on code size. The experimental results also show that filtering improves the scalability and efficiency of code clone detection.

[1]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[2]  Yan Meng,et al.  Code Clone Detection: A Literature Review , 2018 .

[3]  Dongmei Zhang,et al.  Transferring Code-Clone Detection and Analysis to Practice , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[4]  Jianjun Zhao,et al.  CLCMiner: Detecting Cross-Language Clones without Intermediates , 2017, IEICE Trans. Inf. Syst..

[5]  Alberto Sillitti,et al.  A Guided Tour of the Legal Implications of Software Cloning , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[6]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Chanchal Kumar Roy,et al.  Evaluating clone detection tools with BigCloneBench , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[8]  Michael W. Godfrey,et al.  Recommending Clones for Refactoring Using Design, Context, and History , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[9]  Francesca Arcelli Fontana,et al.  Software Clone Detection and Refactoring , 2013 .

[10]  Muhammad Khalid,et al.  An Assessment of Extreme Programming Based Requirement Engineering Process , 2013 .

[11]  Sara Shahzad,et al.  Cyclomatic Complexity for WCF: A Service Oriented Architecture , 2012, 2012 10th International Conference on Frontiers of Information Technology.

[12]  Yang Yuan,et al.  CMCD: Count Matrix Based Code Clone Detection , 2011, 2011 18th Asia-Pacific Software Engineering Conference.

[13]  Michele Marchesi,et al.  Parameter-Based Refactoring and the Relationship with Fan-in/Fan-out Coupling , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[14]  Simone Livieri,et al.  A needle in the stack: efficient clone detection for huge collections of source code , 2010 .

[15]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[16]  Laurie A. Williams,et al.  Is complexity really the enemy of software security? , 2008, QoP '08.

[17]  Laurie A. Williams,et al.  An empirical model to predict security vulnerabilities using code complexity metrics , 2008, ESEM '08.

[18]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[19]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[20]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[21]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[22]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[23]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[24]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[25]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[26]  Michel Dagenais,et al.  Extending software quality assessment techniques to Java systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[27]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[28]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[29]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[30]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .