A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorithm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback, we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%–30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.

[1]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[2]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  Teng Wang,et al.  LogTracker: Learning Log Revision Behaviors Proactively from Software Evolution History , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[5]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[6]  Gabriele Bavota,et al.  Learning How to Mutate Source Code from Bug-Fixes , 2018, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[7]  Robert H. Deng,et al.  VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples , 2017, ESORICS.

[8]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[9]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[10]  Miryung Kim,et al.  Analyzing and Supporting Adaptation of Online Code Examples , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[11]  Ali Mesbah,et al.  Discovering bug patterns in JavaScript , 2016, SIGSOFT FSE.

[12]  Li Li,et al.  A Closer Look at Real-World Patches , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Michael Philippsen,et al.  Move-optimized source code tree differencing , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Md Rakibul Islam,et al.  How bugs are fixed: exposing bug-fix patterns with edits and nesting levels , 2020, SAC.

[15]  Jacques Klein,et al.  FixMiner: Mining relevant fix patterns for automated program repair , 2018, Empirical Software Engineering.

[16]  Martin P. Robillard,et al.  Non-essential changes in version histories , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[17]  Foyzul Hassan,et al.  HireBuild: An Automatic Approach to History-Driven Repair of Build Scripts , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[18]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[19]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[20]  Danny Dig,et al.  Accurate and Efficient Refactoring Detection in Commit History , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Hoan Anh Nguyen,et al.  Clone Management for Evolving Software , 2012, IEEE Transactions on Software Engineering.

[22]  Marcelo de Almeida Maia,et al.  Discovering common bug‐fix patterns: A large‐scale observational study , 2019, J. Softw. Evol. Process..

[23]  Akira Mori,et al.  Diff/TS: A Tool for Fine-Grained Structural Change Analysis , 2008, 2008 15th Working Conference on Reverse Engineering.

[24]  Bin Li,et al.  Analyzing bug fix for automatic bug cause classification , 2020, J. Syst. Softw..

[25]  Walter Rudametkin,et al.  An approach and benchmark to detect behavioral changes of commits in continuous integration , 2020, Empirical Software Engineering.

[26]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  Fabian Beck,et al.  Generating Accurate and Compact Edit Scripts Using Tree Differencing , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[28]  Miryung Kim,et al.  Program element matching for multi-version program analyses , 2006, MSR '06.

[29]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[30]  Ying Wang,et al.  ClDiff: Generating Concise Linked Code Differences , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).