PAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars

In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speedups on abstract parallel machines using a data-parallel approach, practical developments have not materialized, except in recent experiments on ad hoc parsers for large XML files. We describe a general-purpose practical generator (PAPAGENO) able to produce efficient deterministic parallel parsers, which exhibit significant speedups when parsing large texts on modern multi-core machines, while not penalizing sequential operation. The generated parser relies on the properties of Floyd’s operator precedence grammars, to provide a naturally parallel implementation of the parsing process. Parsing of each text portion proceeds in parallel and independently, without communication and synchronization, until all partial parse stacks are recombined into the final result. Since Floyd’s grammars can express most syntaxes with little adaptation, we have performed extensive experiments, on both synthetically generated texts and real JSON documents. The effective parallel code portion in the generated parsers exceeds 80% for most of the tested scenarios.

[1]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[2]  Stefano Crespi-Reghizzi,et al.  Operator Precedence and the Visibly Pushdown Property , 2010, LATA.

[3]  Alon Lavie,et al.  Recognizing substrings of LR(k) languages in linear time , 1994, TOPL.

[4]  Ulrich Germann,et al.  Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too , 2009 .

[5]  M. Dennis Mickunas,et al.  Parallel Compilation In A Multiprocessor Environment (Extended Abstract) , 1978, ACM Annual Conference.

[6]  Wei Lu,et al.  A Parallel Approach to XML Parsing , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[7]  Jacques Cohen,et al.  Estimating the Speedup in Parallel Parsing , 1985, IEEE Transactions on Software Engineering.

[8]  Heiko Goeman On Parsing and Condensing Substrings of LR Languages in Linear Time , 1998, Workshop on Implementing Automata.

[9]  Ying Zhang,et al.  Hybrid Parallelism for XML SAX Parsing , 2008, 2008 IEEE International Conference on Web Services.

[10]  Murray Cole Parallel Programming, List Homomorphisms and the Maximum Segment Sum Problem , 1993, PARCO.

[11]  Stefano Crespi-Reghizzi,et al.  Operator precedence and the visibly pushdown property , 2012, J. Comput. Syst. Sci..