A Generated Parser of C

C++ is an extraordinarily difficult programming language to parse. The language cannot readily be approximated with an LL or LR grammar (regardless of lookahead size), and syntax analysis depends on semantic disambiguation. While conventional (LALR(1) and LL(k)) parser generation tools have been used to build C++ parsers, the effort involved in grammar modification and custom code development is substantial, rivaling the effort of constructing a parser manually. We find that Tomita (GLR) parsing – more widely known in the field of Natural Language Processing – is better suited than conventional approaches to the task of parsing C++. A Tomita parser generator requires no artificial modification of the grammar and emits a parser that processes actual C++ source code in nearlinear time and allows syntactic analysis to be separated from semantic analysis.