Context-based detection of clone-related bugs

Studies show that programs contain much similar code, commonly known as clones. One of the main reasons for introducing clones is programmers' tendency to copy and paste code to quickly duplicate functionality. We commonly believe that clones can make programs difficult to maintain and introduce subtle bugs. Although much research has proposed techniques for detecting and removing clones to improve software maintainability, little has considered how to detect latent bugs introduced by clones. In this paper, we introduce a general notion of context-based inconsistencies among clones and develop an efficient algorithm to detect such inconsistencies for locating bugs. We have implemented our algorithm and evaluated it on large open source projects including the latest versions of the Linux kernel and Eclipse. We have discovered many previously unknown bugs and programming style issues in both projects (with 57 for the Linux kernel and 38 for Eclipse). We have also categorized the bugs and style issues and noticed that they exhibit diverse characteristics and are difficult to detect with any single existing bug detection technique. We believe that our approach complements well these existing techniques.

[1]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[2]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[5]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[6]  Dawson R. Engler,et al.  Using Redundancies to Find Errors , 2003, IEEE Trans. Software Eng..

[7]  Stan Jarzabek,et al.  Eliminating redundancies with a "composition with adaptation" meta-programming technique , 2003, ESEC/FSE-11.

[8]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[9]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[10]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[11]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[12]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[13]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[14]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[15]  Mark Lillibridge,et al.  Extended static checking for Java , 2002, PLDI '02.

[16]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[17]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[18]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[19]  Sorin Lerner,et al.  ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.

[20]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[21]  Dietmar Seipel,et al.  Clone detection in source code by frequent itemset techniques , 2004 .

[22]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[23]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[24]  Dean W. Gonzalez,et al.  “=” considered harmful , 1991, ALET.

[25]  Weishan Zhang,et al.  XVCL: XML-based variant configuration language , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[27]  Stan Jarzabek,et al.  Using Server Pages to Unify Clones in Web Applications: A Trade-Off Analysis , 2007, 29th International Conference on Software Engineering (ICSE'07).

[28]  Stan Jarzabek,et al.  Detecting higher-level similarity patterns in programs , 2005, ESEC/FSE-13.

[29]  Thomas A. Henzinger,et al.  Lazy abstraction , 2002, POPL '02.

[30]  Alexander Aiken,et al.  Scalable error detection using boolean satisfiability , 2005, POPL '05.

[31]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[32]  Isil Dillig,et al.  Static error detection using semantic inconsistency inference , 2007, PLDI '07.

[33]  I.D. Baxter,et al.  DMS/spl reg/: program transformations for practical scalable software evolution , 2004, Proceedings. 26th International Conference on Software Engineering.