Exercises in Free Syntax. Syntax Definition, Parsing, and Assimilation of Language Conglomerates

In modern software development the use of multiple software languages to constitute a single application is ubiquitous. Despite the omnipresent use of combinations of languages, the principles and techniques for using languages together are ad-hoc, unfriendly to programmers, and result in a poor level of integration. We work towards a principled and generic solution to language extension by studying the applicability of modular syntax definition, scannerless parsing, generalized parsing algorithms, and program transformations. We describe MetaBorg, a method for providing concrete syntax for domain abstractions to application programmers. Since object-oriented languages are designed for extensibility and reuse, the language constructs are often sufficient for expressing domain abstractions at the semantic level. However, they do not provide the right abstractions at the syntactic level. The MetaBorg method consists of embedding domain-specific languages in a general purpose host language and assimilating the embedded domain code into the surrounding host code. Instead of extending the implementation of the host language, the assimilation phase implements domain abstractions in terms of existing APIs leaving the host language undisturbed. We present a solution to injection vulnerabilities. Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages into that of the host language and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. We study AspectJ as a typical example of a language conglomerate, i.e. a language composed of a number of separate languages with different syntactic styles. We show that the combination of the lexical syntax leads to considerable complexity in the lexical states to be processed. We show how scannerless parsing elegantly addresses this. We present the design of a modular, extensible, and formal definition of the lexical and context-free aspects of the AspectJ syntax. We introduce grammar mixins, which allows the declarative definition of keyword policies and combination of extensions. We introduce separate compilation of grammars to enable deployment of languages as plugins to a compiler. Current extensible compilers focus on source-level extensibility, which requires users to compile the compiler with a specific configuration of extensions. A compound parser needs to be generated for every combination. We introduce an algorithm for parse table composition to support separate compilation of grammars to parse table components. Parse table components can be composed (linked) efficiently at runtime, i.e. just before parsing. For realistic language combination scenarios involving grammars for real languages, our parse table composition algorithm is an order of magnitude faster than computation of the parse table for the combined grammars, making online language composition feasible.

[1]  Andrew Shalit,et al.  The Dylan Reference Manual: The Definitive Guide to the New Object-Oriented Dynamic Language , 1996 .

[2]  T. van der Storm Component-based configuration, integration and delivery , 2003 .

[3]  Gordon V. Cormack,et al.  Scannerless NSLR(1) parsing of programming languages , 1989, PLDI '89.

[4]  EO Esko Dijk Indoor ultrasonic position estimation using a single base station , 2004 .

[5]  Merijn de Jonge,et al.  XT: a bundle of program transformation tools , 2001, Electron. Notes Theor. Comput. Sci..

[6]  Martin Odersky,et al.  Independently Extensible Solutions to the Expression Problem , 2004 .

[7]  R. Nigel Horspool,et al.  Faster Generalized LR Parsing , 1999, CC.

[8]  Gao Design and verification of lock-free parallel algorithms , 2005 .

[9]  Christophe Ringeissen,et al.  ASF+SDF parsing tools applied to ELAN , 2000, WRLA.

[10]  Eelco Visser,et al.  Generalized type-based disambiguation of meta programs with concrete object syntax , 2005, GPCE'05.

[11]  Eelco Visser,et al.  Disambiguation Filters for Scannerless Generalized LR Parsers , 2002, CC.

[12]  V Victor Bos,et al.  Formal specification and analysis of industrial systems , 2002 .

[13]  Daniel Weise,et al.  Programmable syntax macros , 1993, PLDI '93.

[14]  Adrian Johnstone,et al.  Generalised reduction modified LR parsing for domain specific language prototyping , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[15]  Wpaj Wil Michiels Performance ratios for the differencing method , 2004 .

[16]  James R. Cordy,et al.  TXL: A Rapid Prototyping System for Programming Language Dialects , 1991, Comput. Lang..

[17]  F. Alkemade,et al.  Evolutionary agent-based economics , 2004 .

[18]  Ralf Lämmel,et al.  The Grammar Deployment Kit - System Demonstration , 2002, Electron. Notes Theor. Comput. Sci..

[19]  Clinton L. Jeffery,et al.  Generating LR syntax error messages from examples , 2003, TOPL.

[20]  Hartmut Peter Benz,et al.  Casual Multimedia Process Annotations -- CoMPAs , 2003 .

[21]  SQL Injection Signatures Evasion , 2004 .

[22]  Adrian Johnstone,et al.  The Grammar Tool Box: A Case Study Comparing GLR Parsing Algorithms , 2004, LDTA@ETAPS.

[23]  M. B. van der Zwaag,et al.  Models and logics for process algebra , 2002 .

[24]  Darius Blasband,et al.  Parsing in a hostile world , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[25]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[26]  H.M.A. van Beek,et al.  Specification and analysis of Internet applications , 2005 .

[27]  Joost Visser,et al.  Object-oriented tree traversal with JJForester , 2001, Sci. Comput. Program..

[28]  Wilson C. Hsieh,et al.  Maya: multiple-dispatch syntax extension in Java , 2002, PLDI '02.

[29]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[30]  Gabriele Lenzini,et al.  Integration of Analysis Techniques in Security and Fault-Tolerance , 2005 .

[31]  Bengt Jonsson,et al.  Probabilistic Process Algebra , 2001 .

[32]  Annika Aasa,et al.  Concrete syntax for data objects in functional languages , 1988, LISP and Functional Programming.

[33]  Christophe Ringeissen,et al.  A Pattern Matching Compiler for Multiple Target Languages , 2003, CC.

[34]  Eelco Visser,et al.  MetaBorg in Action: Examples of Domain-Specific Language Embedding and Assimilation Using Stratego/XT , 2005, GTTSE.

[35]  Reino Kurki-Suonio,et al.  On computing the transitive closure of a relation , 2004, Acta Informatica.

[36]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[37]  RJ Roy Willemen,et al.  School timetable construction : algorithms and complexity , 2002 .

[38]  Ansgar Fehnker,et al.  Citius, Vilius, Melius : guiding and cost-optimality in model checking of timed and hybrid systems , 2002 .

[39]  Eelco Visser,et al.  Meta-programming with Concrete Object Syntax , 2002, GPCE.

[40]  Luca Cardelli An implementation of F , 1993 .

[41]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[42]  Cheun Ngen Chong Experiments in rights control : expression and enforcement , 2005 .

[43]  Erika Ábrahám,et al.  An Assertional Proof System for Multithreaded Java - Theory and Tool Support , 2005 .

[44]  Magiel Bruntink,et al.  Renovation of idiomatic crosscutting concerns in embedded systems , 2005 .

[45]  Jonathan Aldrich,et al.  Open Modules: Modular Reasoning About Advice , 2005, ECOOP.

[46]  Michel A. Reniers,et al.  Hybrid process algebra , 2005, J. Log. Algebraic Methods Program..

[47]  Alfred V. Aho,et al.  LR Parsing , 1974, ACM Comput. Surv..

[48]  S. P. Luttik Choice quantification in process algebra , 2002 .

[49]  Enrico Gobbetti,et al.  Encyclopedia of Electrical and Electronics Engineering , 1999 .

[50]  Hayco de Jong,et al.  Generation of abstract programming interfaces from syntax definitions , 2004, J. Log. Algebraic Methods Program..

[51]  Sander M. Bohte,et al.  Spiking Neural Networks , 2003 .

[52]  Masaru Tomita,et al.  Efficient parsing for natural language , 1985 .

[53]  Mark van den Brand,et al.  Generalized Parsing and Term Rewriting: Semantics Driven Disambiguation , 2003, Electron. Notes Theor. Comput. Sci..

[54]  Bastiaan Heeren,et al.  Top quality type error Messages , 2005 .

[55]  Alessandro Orso,et al.  AMNESIA: analysis and monitoring for NEutralizing SQL-injection attacks , 2005, ASE.

[56]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[57]  Naoyasu Ubayashi,et al.  Association aspects , 2004, AOSD '04.

[58]  Chris Verhoef,et al.  Development, assessment, and reengineering of language descriptions , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[59]  Premkumar T. Devanbu,et al.  Static checking of dynamically generated queries in database applications , 2004, Proceedings. 26th International Conference on Software Engineering.

[60]  Sebastian Maneth,et al.  Models of tree translation , 2004 .

[61]  Y Yuechen Qian,et al.  Data synchronization and browsing for home environments , 2004 .

[62]  Tac Tim Willemse Semantics and verification in process algebras with data and timing , 2003 .

[63]  Paul Klint,et al.  The syntax definition formalism SDF—reference manual— , 1989, SIGP.

[64]  George C. Necula,et al.  Elkhound: A Fast, Practical GLR Parser Generator , 2003, CC.

[65]  Arie van Deursen,et al.  Language Prototyping: An Algebraic Specification Approach , 1996, AMAST Series in Computing.

[66]  D. T. Lee,et al.  Securing web application code by static analysis and runtime protection , 2004, WWW '04.

[67]  William R. Cook,et al.  Mixin-based inheritance , 1990, OOPSLA/ECOOP '90.

[68]  Pierre-Etienne Moreau,et al.  Generator of efficient strongly typed abstract syntax trees in Java , 2005, IEE Proc. Softw..

[69]  Ivan Kurtev,et al.  Adaptability of model transformations , 2005 .

[70]  Eelco Visser,et al.  Concrete syntax for objects: domain-specific language embedding and assimilation without restrictions , 2004, OOPSLA '04.

[71]  Yee Wei Law Key management and link-layer security of wireless sensor networks: Energy-efficient attack and defense , 2005 .

[72]  Claus Brabrand,et al.  The metafront System: Extensible Parsing and Transformation , 2003, LDTA@ETAPS.

[73]  Alfred V. Aho,et al.  Deterministic parsing of ambiguous grammars , 1973, POPL.

[74]  Eelco Visser,et al.  Retrofitting the AutoBayes Program Synthesis System with Concrete Syntax , 2004, Domain-Specific Program Generation.

[75]  Ondrej Lhoták,et al.  Adding trace matching with free variables to AspectJ , 2005, OOPSLA '05.

[76]  Benjamin Livshits,et al.  Finding Security Vulnerabilities in Java Applications with Static Analysis , 2005, USENIX Security Symposium.

[77]  Shan Shan Huang,et al.  Generating AspectJ Programs with Meta-AspectJ , 2004, GPCE.

[78]  Ronald Morrison,et al.  Linguistic reflection in Java , 1998, Softw. Pract. Exp..

[79]  Claus Brabrand,et al.  Growing languages with metamorphic syntax macros , 2000, PEPM '02.

[80]  G. Rozenberg,et al.  Effective models for the structure of ð-calculus processes with replication , 2001 .

[81]  den Jeremy Ian Hartog,et al.  Probabilistic Extensions of Semantical Models , 2002 .

[82]  Benjamin C. Pierce,et al.  Xduce: a typed xml processing language , 1997 .

[83]  Daan Leijen,et al.  The λ Abroad - A Functional Approach to Software Components , 2003 .

[84]  P. Zoeteweij,et al.  Composing constraint solvers , 2005 .

[85]  Ralf Lämmel,et al.  Grammar Testing , 2001, FASE.

[86]  Doug Lea Concurrent Programming in Java. Second Edition: Design Principles and Patterns , 1999 .

[87]  Jonathan Bachrach,et al.  The Java syntactic extender (JSE) , 2001, OOPSLA '01.

[88]  R Ronald Ruimerman,et al.  Modeling and remodeling in bone tissue , 2005 .

[89]  Ralf Lämmel,et al.  Grammar Adaptation , 2001, FME.

[90]  Joost Visser,et al.  Generic traversal over typed source code representations , 2003 .

[91]  Mariëlle Stoelinga,et al.  Alea jacta est : verification of probabilistic, real-time and parametric systems , 2002 .

[92]  Merijn de Jonge,et al.  Cost-effective maintenance tools for proprietary languages , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[93]  Eelco Visser,et al.  Program Transformation with Stratego/XT: Rules, Strategies, Tools, and Systems in Stratego/XT 0.9 , 2003, Domain-Specific Program Generation.

[94]  Jurgen Vinju,et al.  Analysis and transformation of source code by parsing and rewriting , 2005 .

[95]  Andres Löh,et al.  Exploring generic Haskell , 2004 .

[96]  Jjd Joep Aerts Random redundant storage for video on demand , 2003 .

[97]  Simona Orzan,et al.  On Distributed Verification and Verified Distribution , 2004 .

[98]  Burt M. Leavenworth,et al.  Syntax macros and extended translation , 1966, CACM.

[99]  Wolfram Schulte,et al.  The essence of data access in Cω: the power is in the dot! , 2005 .

[100]  Ana Sokolova,et al.  Coalgebraic analysis of probabilistic systems , 2005 .

[101]  Dino Salvo Distefano,et al.  On model checking the dynamics of object-based software : a foundational approach , 2003 .

[102]  Maria Eva Magdalena Lijding,et al.  Real-Time Scheduling of Tertiary Storage , 2003 .

[103]  M. A. Valero Espada,et al.  Modal Abstraction and Replication of Processes with Data , 2005 .

[104]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[105]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[106]  Eelco Visser,et al.  Syntax definition for language prototyping , 1997 .

[107]  Daan Leijen,et al.  Domain specific embedded compilers , 1999, DSL '99.

[108]  Wolfram Schulte,et al.  Unifying Tables, Objects and Documents , 2003 .

[109]  T. Kuipers,et al.  Techniques for understanding legacy software systems , 2002 .

[110]  Bruce W. Weide,et al.  Using parse tree validation to prevent SQL injection attacks , 2005, SEM '05.

[111]  William G. Griswold,et al.  An Overview of AspectJ , 2001, ECOOP.

[112]  M. T. de Berg,et al.  Multi-functional geometric data structures , 2003 .

[113]  R.A. McClure,et al.  SQL DOM: compile time checking of dynamic SQL statements , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[114]  Paul Hudak,et al.  Building domain-specific embedded languages , 1996, CSUR.

[115]  Éric Tanter,et al.  A versatile kernel for multi-language AOP , 2005, GPCE'05.

[116]  Jeroen Eggermont,et al.  Data Mining using Genetic Programming : Classification and Symbolic Regression , 2005 .

[117]  Siddhartha Rai,et al.  Safe query objects: statically typed objects as remotely executable queries , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[118]  M. Niqui,et al.  Formalising Exact Arithmetic. Representations, Algorithms and Proofs , 2004 .

[119]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[120]  Olga Tveretina,et al.  A Decision Procedure for Equality Logic with Uninterpreted Functions , 2004, AISC.

[121]  M. de Jonge,et al.  To reuse or to be reused. Techniques for component composition and construction , 2003 .

[122]  Jeroen Doumen,et al.  Searching in encrypted data , 2004 .

[123]  Paul Klint,et al.  Incremental generation of parsers , 1989, PLDI '89.

[124]  Gertjan van Noord Treatment of Epsilon Moves in Subset Construction , 1998, Computational Linguistics.

[125]  N.J.M. van den Nieuwelaar,et al.  Supervisory machine control by predictive-reactive scheduling , 2004 .

[126]  Reinder J. Bril,et al.  Real-time scheduling for media processing using conditionally guaranteed budgets , 2004 .

[127]  Michael R. Clarkson,et al.  Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[128]  Martín Abadi,et al.  Extensible Syntax with Lexical Scoping , 1994 .

[129]  Joost Visser Visitor combination and traversal control , 2001, OOPSLA '01.

[130]  Thomas Wolle,et al.  Computational aspects of treewidth : Lower bounds and network reliability , 2005 .

[131]  Yannis Smaragdakis,et al.  JTS: tools for implementing domain-specific languages , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[132]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[133]  Goran Frehse,et al.  Compositional verification of hybrid systems using simulation relations , 2005 .

[134]  Ralf Lämmel,et al.  Towards an engineering discipline for GRAMMARWARE Draft as of August 17 , 2003 , 2003 .

[135]  A. L. de Groot,et al.  Practical Automaton proofs in PVS , 2000 .

[136]  Ys Yaroslav Usenko,et al.  Linearization in muCRL , 2002 .

[137]  Cruz Filipe,et al.  Constructive real analysis : a type-theoretical formalization and applications , 2004 .

[138]  Atze Dijkstra Stepping through Haskell , 2000 .

[139]  Ondrej Lhoták,et al.  abc: an extensible AspectJ compiler , 2005, AOSD '05.

[140]  J. Rekers,et al.  Parser Generation for Interactive Environments , 1992 .

[141]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[142]  Laurie J. Hendren,et al.  SableCC, an object-oriented compiler framework , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[143]  Maurice H. ter Beek,et al.  Team Automata: A Formal Approach to the Modeling of Collaboration Between System Components , 2003 .

[144]  Mohammad Reza Mousavi,et al.  Structuring structural operational semantics , 2005 .

[145]  Premkumar T. Devanbu,et al.  JDBC checker: a static analysis tool for SQL/JDBC applications , 2004, Proceedings. 26th International Conference on Software Engineering.

[146]  Hidehiko Masuhara,et al.  Dataflow Pointcut in Aspect-Oriented Programming , 2003, APLAS.

[147]  Eelco Visser,et al.  Using Filters for the Disambiguation of Context-free Grammars , 1994 .

[148]  Torbjörn Ekman,et al.  Rewritable Reference Attributed Grammars , 2004, ECOOP.

[149]  Martin Odersky,et al.  Domain-Specific Program Generation , 2004, Lecture Notes in Computer Science.

[150]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[151]  Leon Moonen,et al.  Exploring software systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[152]  Martijn M. Schrage,et al.  Proxima: a presentation-oriented editor for structured documents , 2000 .

[153]  F. Bartels,et al.  On Generalised Coinduction and Probabilistic Specification Formats , 2004 .

[154]  Eelco Visser,et al.  Building program optimizers with rewriting strategies , 1998, ICFP '98.

[155]  Erik Meijer,et al.  Haskell Server Pages - Functional Programming and the Battle for the Middle Tier , 2001, Haskell.

[156]  Eelco Visser,et al.  Scannerless Generalized-LR Parsing , 1997 .