Glr* { an Eecient Noise-skipping Parsing Algorithm for Context Free Grammars

This chapter describes GLR*, a parser that can parse any input sentence by ignoring unrecognizable parts of the sentence. Using an eecient algorithm, the parser is capable of nding and parsing a maximal subset of the original input that is parsable, and therefore return the parse with fewest skipped words. The parser returns some parse(s) for any input sentence, unless no part of the sentence can be recognized at all. Formally, the problem can be deened in the following way: Given a context-free grammar G and a sentence S, nd and parse S 0-the largest subset of words of S, such that S 0 2 L(G). The algorithm described in this chapter is a modiication of the Generalized LR (Tomita) parsing algorithm (Tomita, (1986)). The parser accommodates the skipping of words by allowing shift operations to be performed from inactive state nodes of the Graph Structured Stack. A heuristic similar to beam search makes the algorithm computationally tractable. The modiied parser, GLR*, has been implemented and integrated with the latest version of the Generalized LR Parser/Compiler (Tomita et al., (1988), Tomita, (1990)). We discuss an application of the GLR* parser to spontaneous speech understanding and present some preliminary tests on the utility of the GLR* parser in such settings.