CBCD: Cloned buggy code detector

Developers often copy, or clone, code in order to reuse or modify functionality. When they do so, they also clone any bugs in the original code. Or, different developers may independently make the same mistake. As one example of a bug, multiple products in a product line may use a component in a similar wrong way. This paper makes two contributions. First, it presents an empirical study of cloned buggy code. In a large industrial product line, about 4% of the bugs are duplicated across more than one product or file. In three open source projects (the Linux kernel, the Git version control system, and the PostgreSQL database) we found 282, 33, and 33 duplicated bugs, respectively. Second, this paper presents a tool, CBCD, that searches for code that is semantically identical to given buggy code. CBCD tests graph isomorphism over the Program Dependency Graph (PDG) representation and uses four optimizations. We evaluated CBCD by searching for known clones of buggy code segments in the three projects and compared the results with text-based, token-based, and AST-based code clone detectors, namely Simian, CCFinder, Deckard, and CloneDR. The evaluation shows that CBCD is fast when searching for possible clones of the buggy code in a large system, and it is more precise for this purpose than the other code clone detectors.

[1]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[2]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[3]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[4]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[5]  Benjamin Livshits,et al.  Finding application errors and security flaws using PQL: a program query language , 2005, OOPSLA '05.

[6]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[7]  Rainer Koschke,et al.  Approximate Code Search in Program Histories , 2011, 2011 18th Working Conference on Reverse Engineering.

[8]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[9]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[10]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[11]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[12]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[13]  Jeffrey Xu Yu,et al.  Matching dependence-related queries in the system dependence graph , 2010, ASE.

[14]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Heejung Kim,et al.  MeCC: memory comparison-based clone detector , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[16]  Alexander Chatzigeorgiou,et al.  Design Pattern Detection Using Similarity Scoring , 2006, IEEE Transactions on Software Engineering.

[17]  Hoan Anh Nguyen,et al.  Recurring bug fixes in object-oriented programs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[18]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[19]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[20]  Jiong Yang,et al.  Discovering Neglected Conditions in Software by Mining Dependence Graphs , 2008, IEEE Transactions on Software Engineering.

[21]  Junfeng Yang,et al.  Scalable and systematic detection of buggy inconsistencies in source code , 2010, OOPSLA.

[22]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[23]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[24]  Hoan Anh Nguyen,et al.  Complete and accurate clone detection in graph-based models , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[26]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[27]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[28]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[29]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[30]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.