Scanning and Parsing Languages with Ambiguities and Constraints: The Lamb and Fence Algorithms

Traditional language processing tools constrain language designers to specific kinds of grammars. In contrast, model-based language processing tools decouple language design from language processing. These tools allow the occurrence of lexical and syntactic ambiguities in language specifications and the declarative specification of constraints for resolving them. As a result, these techniques require scanners and parsers able to parse context-free grammars, handle ambiguities, and enforce constraints for disambiguation. In this paper, we present Lamb and Fence. Lamb is a scanning algorithm that supports ambiguous token definitions and the specification of custom pattern matchers and constraints. Fence is a chart parsing algorithm that supports ambiguous context-free grammars and the definition of constraints on associativity, composition, and precedence, as well as custom constraints. Lamb and Fence, in conjunction, enable the implementation of the ModelCC model-based language processing tool.

[1]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[2]  Hiroaki Saito,et al.  An Efficient Parser Generator for Natural Language , 1994, COLING.

[3]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[4]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[5]  Giorgio Satta,et al.  Theory of Parsing , 2010 .

[6]  Jaroslav Porubän,et al.  Annotation based parser generator , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[7]  Fernando Berzal Galiano,et al.  A Model-Based Multilingual Natural Language Parser - Implementing Chomsky's X-bar Theory in ModelCC , 2013, FQAS.

[8]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[9]  Jaime G. Carbonell,et al.  The Universal Parser Architecture for Knowledge-based Machine Translation , 1987, IJCAI.

[10]  F. L. Deremer,et al.  Practical translators for LR(k) languages , 1969 .

[11]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[12]  R. Nigel Horspool,et al.  Schrödinger's token , 2001, Softw. Pract. Exp..

[13]  Jay Earley,et al.  Ambiguity and precedence in syntax description , 1975, Acta Informatica.

[14]  George C. Necula,et al.  Elkhound: A Fast, Practical GLR Parser Generator , 2003, CC.

[15]  Anthony G. Oettinger,et al.  Automatic syntactic analysis and the pushdown store , 1961 .

[16]  Alfred V. Aho,et al.  Deterministic parsing of ambiguous grammars , 1973, POPL.

[17]  Frank DeRemer,et al.  Efficient computation of LALR(1) look-ahead sets , 1979, SIGPLAN '79.

[18]  Bryan Ford,et al.  Packrat Parsing: Simple, Powerful, Lazy, Linear Time , 2006, ICFP 2002.

[19]  Fernando Berzal Galiano,et al.  Fence - A Context-free Grammar Parser with Constraints for Model-driven Language Specification , 2018, ICSOFT.

[20]  Fernando Berzal Galiano,et al.  A Tool for Model-Based Language Specification , 2011, ArXiv.

[21]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[22]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[23]  Fernando Berzal Galiano,et al.  A Language Specification Tool for Model-Based Parsing , 2011, IDEAL.

[24]  Franklin L. DeRemer,et al.  Simple LR(k) grammars , 1971, Commun. ACM.

[25]  Fernando Berzal Galiano,et al.  Model-driven development using standard tools , 2007, ICEIS.

[26]  Anneke Kleppe,et al.  Towards the Generation of a Text-Based IDE from a Language Metamodel , 2007, ECMDA-FA.

[27]  Fernando Berzal Galiano,et al.  Lamb - A Lexical Analyzer with Ambiguity Support , 2011, ICSOFT.

[28]  J. van Leeuwen,et al.  Intelligent Data Engineering and Automated Learning , 2003, Lecture Notes in Computer Science.

[29]  Jerzy R. Nawrocki Conflict Detection and Resolution in a Lexical Analyzer Generator , 1991, Inf. Process. Lett..

[30]  Eelco Visser,et al.  Pure and declarative syntax definition: paradise lost and regained , 2010, OOPSLA.

[31]  Frank DeRemer,et al.  Efficient computation of LALR(1) look-ahead sets , 2004, SIGP.

[32]  Stan Jarzabek,et al.  LL-Regular Grammars , 1975, Inf. Process. Lett..

[33]  Terence Parr,et al.  LL(*): the foundation of the ANTLR parser generator , 2011, PLDI '11.

[34]  Robert McNaughton,et al.  Regular Expressions and State Graphs for Automata , 1960, IRE Trans. Electron. Comput..

[35]  Luis Quesada A model-driven parser generator with reference resolution support , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[36]  E. Schmidt,et al.  Lex—a lexical analyzer generator , 1990 .

[37]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[38]  Tony Mason,et al.  Lex & Yacc , 1992 .

[39]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[40]  J. Rekers,et al.  Parser Generation for Interactive Environments , 1992 .

[41]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[42]  Robert Grimm,et al.  Better extensibility through modular syntax , 2006, PLDI '06.

[43]  Anton Nijholt,et al.  On the Parsing of LL-Regular Grammars , 1976, International Symposium on Mathematical Foundations of Computer Science.

[44]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[45]  Martin Fowler Using Metadata , 2002, IEEE Softw..

[46]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[47]  Douglas C. Schmidt,et al.  Model-Driven Engineering , 2006 .

[48]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[49]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[50]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.