Packrat parsing:: simple, powerful, lazy, linear time, functional pearl

Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees linear parse time. Any language defined by an LL(k) or LR(k) grammar can be recognized by a packrat parser, in addition to many languages that conventional linear-time algorithms do not support. This additional power simplifies the handling of common syntactic idioms such as the widespread but troublesome longest-match rule, enables the use of sophisticated disambiguation strategies such as syntactic and semantic predicates, provides better grammar composition properties, and allows lexical analysis to be integrated seamlessly into parsing. Yet despite its power, packrat parsing shares the same simplicity and elegance as recursive descent parsing; in fact converting a backtracking recursive descent parser into a linear-time packrat parser often involves only a fairly straightforward structural change. This paper describes packrat parsing informally with emphasis on its use in practical applications, and explores its advantages and disadvantages with respect to the more conventional alternatives.

[1]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[2]  Jeffrey D. Ullman,et al.  Parsing Algorithms with Backtrack , 1973, Inf. Control..

[3]  Kuo-Chung Tai,et al.  Noncanonical SLR(1) Grammars , 1979, TOPL.

[4]  Masaru Tomita,et al.  Efficient parsing for natural language , 1985 .

[5]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[6]  P. Wadler How to Replace Failure by a List of Successes: A method for exception handling, backtracking, and pattern matching in lazy functional languages , 1985, FPCA.

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  Gordon V. Cormack,et al.  Scannerless NSLR(1) parsing of programming languages , 1989, PLDI '89.

[9]  Joyce L. Vedral,et al.  Functional Programming Languages and Computer Architecture , 1989, Lecture Notes in Computer Science.

[10]  Stephen R. Adams Modular grammars for programming language prototyping , 1991 .

[11]  Terence John Parr,et al.  Obtaining practical variants of ll(k) and lr(k) for k|? 1 by splitting the atomic k-tuple , 1993 .

[12]  Russell W. Quong,et al.  Adding Semantic and Syntactic Predicates To LL(k): pred-LL(k) , 1994, CC.

[13]  Martín Abadi,et al.  Extensible Syntax with Lexical Scoping , 1994 .

[14]  Jeroen D. Fokker,et al.  Functional Parsers , 1995, Advanced Functional Programming.

[15]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[16]  Eelco Visser,et al.  Scannerless Generalized-LR Parsing , 1997 .

[17]  Graham Hutton,et al.  Monadic parsing in Haskell , 1998, Journal of Functional Programming.

[18]  Peter Pepper,et al.  Lr Parsing = Grammar Transformation + Ll Parsing Making Lr Parsing More Understandable and More Eecient , 1999 .

[19]  Carlos Camarão,et al.  A Monadic Combinator Compiler Compiler , 2001 .

[20]  Daan Leijen,et al.  Parsec, a fast combinator parser , 2001 .

[21]  Eelco Visser,et al.  Disambiguation Filters for Scannerless Generalized LR Parsers , 2002, CC.

[22]  Bryan Ford,et al.  Packet parsing : a practical linear-time algorithm with backtracking , 2002 .

[23]  Lillian Lee,et al.  Fast context-free grammar parsing requires fast boolean matrix multiplication , 2001, JACM.