Natural language grammatical inference: a comparison of recurrent neural networks and machine learning methods

We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feed-forward neural networks, Frasconi-Gori-Soda and Back-Tsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and edit-distance nearest-neighbors, and decision trees. Non-neural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance.

[1]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[2]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[3]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[4]  Juan Uriagereka,et al.  A Course in GB Syntax: Lectures on Binding and Empty Categories , 1988 .

[5]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[6]  David S. Touretzky Rules and Maps in Connectionist Symbol Processing , 1989 .

[7]  James P. Crutchfield,et al.  Computation at the Onset of Chaos , 1991 .

[8]  David S. Touretzky,et al.  BoltzCONS: Dynamic Symbol Structures in a Connectionist Network , 1990, Artif. Intell..

[9]  Noam Chomsky Knowledge of Language , 1986 .

[10]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[11]  N. Chater,et al.  Proceedings of the fourteenth annual conference of the cognitive science society , 1992 .

[12]  Andreas Stolcke Learning Feature-based Semantics with Simple Recurrent Networks , 1990 .

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[15]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[16]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[17]  David Pesetsky,et al.  Paths and categories , 1982 .

[18]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[19]  Alessandro Sperduti,et al.  Learning Distributed Representations for the Classification of Terms , 1995, IJCAI.

[20]  R. Taraban,et al.  Language learning: Cues or rules? , 1989 .

[21]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[22]  W. H. Zurek Complexity, Entropy and the Physics of Information , 1990 .

[23]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[24]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[25]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[26]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[27]  Aravind K. Joshi,et al.  Natural language parsing: Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? , 1985 .

[28]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[29]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[30]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[31]  Garrison W. Cottrell,et al.  A Connectionist Perspective on Prosodic Structure , 1989 .

[32]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[33]  Noam Chomsky Knowledge of language: its nature, origin, and use , 1988 .

[34]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[35]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[36]  George Berg,et al.  A Connectionist Parser with Recursive Sentence Structure and Lexical Disambiguation , 1992, AAAI.

[37]  Mary Hare,et al.  The Role of Similarity in Hungarian Vowel Harmony: a Connectionist Account , 1990 .