Lightweight multi-language syntax transformation with parser parser combinators

Automatically transforming programs is hard, yet critical for automated program refactoring, rewriting, and repair. Multi-language syntax transformation is especially hard due to heterogeneous representations in syntax, parse trees, and abstract syntax trees (ASTs). Our insight is that the problem can be decomposed such that (1) a common grammar expresses the central context-free language (CFL) properties shared by many contemporary languages and (2) open extension points in the grammar allow customizing syntax (e.g., for balanced delimiters) and hooks in smaller parsers to handle language-specific syntax (e.g., for comments). Our key contribution operationalizes this decomposition using a Parser Parser combinator (PPC), a mechanism that generates parsers for matching syntactic fragments in source code by parsing declarative user-supplied templates. This allows our approach to detach from translating input programs to any particular abstract syntax tree representation, and lifts syntax rewriting to a modularly-defined parsing problem. A notable effect is that we skirt the complexity and burden of defining additional translation layers between concrete user input templates and an underlying abstract syntax representation. We demonstrate that these ideas admit efficient and declarative rewrite templates across 12 languages, and validate effectiveness of our approach by producing correct and desirable lightweight transformations on popular real-world projects (over 50 syntactic changes produced by our approach have been merged into 40+). Our declarative rewrite patterns require an order of magnitude less code compared to analog implementations in existing, language-specific tools.

[1]  James R. Cordy,et al.  The TXL source transformation language , 2006, Sci. Comput. Program..

[2]  David Lo,et al.  A Deeper Look into Bug Fixes: Patterns, Replacements, Deletions, and Additions , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[3]  Jonathan Bachrach,et al.  The Java Syntactic Extender , 2001, OOPSLA.

[4]  Daan Leijen,et al.  Parsec: direct style monadic parser combinators for the real world , 2001 .

[5]  Jonathan Bachrach,et al.  The Java syntactic extender (JSE) , 2001, OOPSLA '01.

[6]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[7]  Gerard J. Holzmann Cobra: fast structural code checking (keynote) , 2017, SPIN.

[8]  Armando Solar-Lezama,et al.  One tool, many languages: language-parametric transformation with incremental parametric syntax , 2017, ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity.

[9]  Chris Okasaki Functional Pearl: Even Higher-Order Functions for Parsing , 1998, J. Funct. Program..

[10]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  Claire Le Goues,et al.  Static Automated Program Repair for Heap Properties , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[12]  Jakob Nielsen,et al.  Chapter 4 – The Usability Engineering Lifecycle , 1993 .

[13]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[14]  Eelco Visser,et al.  Interactive Disambiguation of Meta Programs with Concrete Object Syntax , 2010, SLE.

[15]  Tijs van der Storm,et al.  RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[16]  E.A.T. Merks,et al.  Language Design For Program Manipulation , 1992, IEEE Trans. Software Eng..

[17]  Robert Hieb,et al.  Syntactic abstraction in scheme , 1992, LISP Symb. Comput..

[18]  Louis Wasserman Scalable, example-based refactorings with refaster , 2013, WRT '13.

[19]  E. Meijer,et al.  Monadic parser combinators , 1996 .

[20]  Chris Okasaki,et al.  Even higher-order functions for parsing or Why would anyone ever want to use a sixth-order function? , 1998, Journal of Functional Programming.

[21]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[22]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[23]  Josh Levenberg,et al.  Why Google stores billions of lines of code in a single repository , 2016, Commun. ACM.

[24]  Julia L. Lawall,et al.  Coccinelle: 10 Years of Automated Evolution in the Linux Kernel , 2018, USENIX Annual Technical Conference.

[25]  Dawson R. Engler,et al.  How to Build Static Checking Systems Using Orders of Magnitude Less Code , 2016, ASPLOS.

[26]  Jonathan Bachrach,et al.  D-Expressions : Lisp Power , Dylan Style , 1999 .

[27]  Sukyoung Ryu Scalable framework for parsing: from Fortress to JavaScript , 2016, Softw. Pract. Exp..

[28]  Eelco Visser,et al.  Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..

[29]  Swarat Chaudhuri,et al.  Instrumenting C Programs with Nested Word Monitors , 2007, SPIN.

[30]  Jonathan I. Maletic,et al.  srcML 1.0: Explore, Analyze, and Manipulate Source Code , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[31]  Jonathan I. Maletic,et al.  Exploration, Analysis, and Manipulation of  Source Code Using srcML , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[33]  Jonathan I. Maletic,et al.  Lightweight Transformation and Fact Extraction with the srcML Toolkit , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[34]  Valentin F. Turchin,et al.  The concept of a supercompiler , 1986, TOPL.

[35]  Seymour Ginsburg,et al.  Bracketed Context-Free Languages , 1967, J. Comput. Syst. Sci..

[36]  Amir Shaikhha,et al.  Quoted staged rewriting: a practical approach to library-defined optimizations , 2017, GPCE.

[37]  Eelco Visser,et al.  Meta-programming with Concrete Object Syntax , 2002, GPCE.

[38]  Eelco Visser,et al.  The State of the Art in Language Workbenches - Conclusions from the Language Workbench Challenge , 2013, SLE.

[39]  Frédéric Magniez,et al.  Improved bounds for testing Dyck languages , 2018, SODA.

[40]  Jean Berstel,et al.  Balanced Grammars and Their Languages , 2002, Formal and Natural Computing.

[41]  Robert Grimm,et al.  Better extensibility through modular syntax , 2006, PLDI '06.

[42]  Sven Apel,et al.  Discipline Matters: Refactoring of Preprocessor Directives in the #ifdef Hell , 2018, IEEE Transactions on Software Engineering.

[43]  Konstantinos Sagonas,et al.  Automatic refactoring of Erlang programs , 2009, PPDP '09.

[44]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[45]  Jacques Chabin,et al.  Visibly Pushdown Languages and Term Rewriting , 2007, FroCoS.

[46]  Daniel Weise,et al.  Programmable syntax macros , 1993, PLDI '93.

[47]  Gerard J. Holzmann Cobra: a light-weight tool for static and dynamic program analysis , 2016, Innovations in Systems and Software Engineering.