Extracting error productions from a neural network-based LR parser

Abstract It is well-known that traditional rule-based parsers have poor error recovery capability. This has limited their practicality in natural language processing, where robustness and flexibility are of primary concern. In view of this, we propose the neural network LR parser (NNLR) in which the shift-reduce parsing decision of the LR parser is simulated by a feedforward neural network. Being trained with a small set of grammatical sentences only, the NNLR is capable of parsing a significantly large number of erroneous sentences. To explore the knowledge encoded in the neural network that sustains its robust processing capacity, we analyze the NNLR in two ways. First, we show that in the NNLR, erroneous sentences are recovered as if the parser had filled in some of the empty slots in the original LR parsing table. An augmented parsing table is thus constructed. Second, a set of new grammar rules, commonly called error productions, can be extracted from the trained network. When being included in the original grammar, these rules allow certain erroneous sentences to be generated and parsed in addition to grammatical ones. In both analyses, the symbolic knowledge discovered is readily comprehensible, and it can potentially be re-used by the original LR parser to enhance its robustness.

[1]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[2]  Lai-Wan Chan,et al.  How to Design a Connectionist Holistic Parser , 1999, Neural Computation.

[3]  James L. McClelland,et al.  Graded state machines: The representation of temporal contingencies in simple recurrent networks , 1991, Machine Learning.

[4]  Kanaan A. Faisal,et al.  Symbolic parsing via subsymbolic rules , 1992 .

[5]  KweeTjoeLiong Review of "Natural language processing in LISP , 1990 .

[6]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[7]  Mitchell P. Marcus,et al.  A theory of syntactic recognition for natural language , 1979 .

[8]  David Maier Review of "Introduction to automata theory, languages and computation" by John E. Hopcroft and Jeffrey D. Ullman. Addison-Wesley 1979. , 1980, SIGA.

[9]  Bruce J. McKenzie,et al.  Error repair in shift-reduce parsers , 1995, TOPL.

[10]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[11]  Corrado Priami,et al.  Comparison of syntactic error handling in LR parsers , 1995, Softw. Pract. Exp..

[12]  David W. Carroll,et al.  Psychology of Language , 1993 .

[13]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[14]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[15]  Chris Mellish,et al.  Natural Language Processing in Pop-11: An Introduction to Computational Linguistics , 1989 .

[16]  C. L. Giles,et al.  Constructive learning of recurrent neural networks , 1993, IEEE International Conference on Neural Networks.

[17]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[18]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[19]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[20]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[21]  Charles N. Fischer,et al.  On the Role of Error Productions in Syntactic Error Correction , 1980, Comput. Lang..

[22]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[23]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.