C++ is an extraordinarily difficult programming language to parse. The language cannot readily be approximated with an LL or LR grammar (regardless of lookahead size), and syntax analysis depends on semantic disambiguation. While conventional (LALR(1) and LL(k)) parser generation tools have been used to build C++ parsers, the effort involved in grammar modification and custom code development is substantial, rivaling the effort of constructing a parser manually. We find that Tomita (GLR) parsing – more widely known in the field of Natural Language Processing – is better suited than conventional approaches to the task of parsing C++. A Tomita parser generator requires no artificial modification of the grammar and emits a parser that processes actual C++ source code in nearlinear time and allows syntactic analysis to be separated from semantic analysis.
[1]
Neville Churcher,et al.
Virtual Worlds for Software Visualisation
,
1999
.
[2]
Terence Parr.
Language Translation Using PCCTS and C
,
1999
.
[3]
Jay Earley,et al.
An efficient context-free parsing algorithm
,
1970,
Commun. ACM.
[4]
Miguel A. Alonso,et al.
Construction of Efficient Generalized LR Parsers
,
1997,
Workshop on Implementing Automata.
[5]
Murray Hill,et al.
Yacc: Yet Another Compiler-Compiler
,
1978
.
[6]
Mark Johnson.
The Computational Complexity of GLR Parsing
,
1991
.
[7]
Jeffrey D. Ullman,et al.
Introduction to Automata Theory, Languages and Computation
,
1979
.
[8]
Margaret King,et al.
Parsing Natural Language
,
1983
.