The Semantics of Parsing with Semantic Actions

The recovery of structure from flat sequences of input data is a problem that almost all programs need to solve. Computer Science has developed a wide array of declarative languages for describing the structure of languages, usually based on the context-free grammar formalism, and there exist parser generators that produce efficient parsers for these descriptions. However, when faced with a problem involving parsing, most programmers opt for ad-hoc hand-coded solutions, or use parser combinator libraries to construct parsing functions. This paper develops a hybrid approach, treating grammars as collections of active right-hand sides, indexed by a set of non-terminals. Active right-hand sides are built using the standard monadic parser combinators and allow the consumed input to affect the language being parsed, thus allowing for the precise description of the realistic languages that arise in programming. We carefully investigate the semantics of grammars with active right-hand sides, not just from the point of view of language acceptance but also in terms of the generation of parse results. Ambiguous grammars may generate exponentially, or even infinitely, many parse results and these must be efficiently represented using Shared Packed Parse Forests (SPPFs). A particular feature of our approach is the use of Reynolds-style parametricity to ensure that the language that grammars describe cannot be affected by the representation of parse results.

[1]  George C. Necula,et al.  Elkhound: A Fast, Practical GLR Parser Generator , 2003, CC.

[2]  John C. Reynolds,et al.  Types, Abstraction and Parametric Polymorphism , 1983, IFIP Congress.

[3]  Adrian Johnstone,et al.  Recognition is not parsing - SPPF-style parsing from cubic recognisers , 2010, Sci. Comput. Program..

[4]  Elizabeth Scott,et al.  SPPF-Style Parsing From Earley Recognisers , 2008, LDTA@ETAPS.

[5]  Richard A. Frost,et al.  Parser Combinators for Ambiguous Left-Recursive Grammars , 2008, PADL.

[6]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[7]  Conor McBride,et al.  Applicative programming with effects , 2008, J. Funct. Program..

[8]  S. Lane Categories for the Working Mathematician , 1971 .

[9]  Trevor Jim,et al.  A New Method for Dependent Parsing , 2011, ESOP.

[10]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[11]  Masaru Tomita,et al.  Efficient parsing for natural language , 1985 .

[12]  Daan Leijen,et al.  Parsec: direct style monadic parser combinators for the real world , 2001 .

[13]  David Walker,et al.  Semantics and algorithms for data-dependent grammars , 2010, POPL '10.

[14]  S. Doaitse Swierstra,et al.  Combinator Parsing: A Short Tutorial , 2009, LerNet ALFA Summer School.

[15]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[16]  Graham Hutton,et al.  Monadic parsing in Haskell , 1998, Journal of Functional Programming.

[17]  Mark Johnson The Computational Complexity of GLR Parsing , 1991 .

[18]  Trevor Jim,et al.  Delayed semantic actions in Yakker , 2011, LDTA.