Approximate Code Search in Program Histories

Very often a defect must be corrected not only in the current version of a program at one particular place but in many places and many other versions -- possibly even in different development branches. Consequently, we need a technique to efficiently locate all approximate matches of an arbitrary defective code fragment in the program's history as they may need to be fixed as well. This paper presents an approximate whole-program code search in multiple releases and branches. We evaluate this technique for real-world defects of various large and realistic programs having multiple releases and branches. We report runtime measurements and recall using varying levels of allowable differences of the approximate search.

[1]  Rainer Koschke,et al.  Studying clone evolution using incremental clone detection , 2013, J. Softw. Evol. Process..

[2]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[3]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[4]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[5]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[6]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[7]  Elmar Jürgens,et al.  Index-based code clone detection: incremental, distributed, scalable , 2010, 2010 IEEE International Conference on Software Maintenance.

[8]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[9]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[12]  Gad M. Landau,et al.  An efficient string matching algorithm with k differences for nucleotide and amino acid sequences , 2018, Nucleic Acids Res..

[13]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[14]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[15]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[16]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[17]  Jordan Lampe,et al.  Theoretical and Empirical Comparisons of Approximate String Matching Algorithms , 1992, CPM.

[18]  Eugene W. Myers,et al.  A sublinear algorithm for approximate keyword searching , 1994, Algorithmica.

[19]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[20]  Eugene L. Lawler,et al.  Sublinear approximate string matching and biological applications , 1994, Algorithmica.

[21]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[22]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[23]  Ricardo A. Baeza-Yates,et al.  Fast and Practical Approximate String Matching , 1996, Inf. Process. Lett..

[24]  Roberto Grossi,et al.  A Note on Updating Suffix Tree Labels , 1997, CIAC.

[25]  Seung-won Hwang,et al.  Instant code clone search , 2010, FSE '10.

[26]  Pavel A. Pevzner,et al.  Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[27]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[28]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[29]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[30]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.