A Simple Realization of LR-Parsers for Regular Right Part Grammars

A regular right part grammar (RRPG) (or an extended context-free grammar) is a context-free grammar in which regular expressions of grammar symbols are allowed in the right-hand sides of productions [1-6]. RRPGs are useful for representing the syntax of programming languages naturally and briefly, and are widely used to specify programming languages. An RRPG is called an ELR(k) grammar if its sentences can be analyzed from left to right by the LR-parsing method with a lookahead of k symbols. More precisely, an RRPG is an ELR(k) grammar if (i) S =~+ S is impossible, and (ii) if S ~ * etAz ~ otl3z, S ~ * ~Bx ~ al3y, and FIRSTk(Z ) =FIRST k(y) implies A = B , et=~,, and x = y , where all derivations are rightmost [6]. The corresponding parser is called an ELR-parser in this paper. In the following, we deal with the case k = l . The main problem with ELR-parsing of ELRgrammars is in the 'reduce' action when the righthand side of a production is recognized. The problem is that in ELR-grammars the length of the sentential form generated by the regular expression of the right-hand side of a production is generally not fixed and, therefore, extra work is required to identify the left end of a handle to be reduced. Three approaches have been proposed so far for ELR-parsing of ELR-grammars: (1) Transform the ELR-grammar to an equivalent LR-grammar and apply standard techniques for constructing the LR-parser [1,2]. (2) Build the ELR-parser directly from the ELR-grammar (another method of [1,3,4,5]). (3) A method similar to (2), but transformation to another ELR-grammar is necessary in some cases [6]. In approaches (1) and (3), extra nonterminals are added to the transformed grammar and the correspondence of semantic rules with syntax rules is broken off. In this paper, we present a simple method based on approach (2). No grammar transformation is necessary. In previous methods based on approach (2), the addition of readback machines [3,4,5] or the investigation of the lookback state [5,8] at reduction time was necessary. The algorithms for these methods were rather complicated. In our method, an ELR-parser can be realized with a slight refinement of the usual LRparser technique, by storing so-called count values for counting the length of grammar symbols generated by the right-hand side of productions. Although the parsing efficiency of the method is not the best, the generation of the LR-parser is simple and practical.