Natural Language Grammatical Inference with Recurrent Neural Networks

This paper examines the inductive inference of a complex grammar with neural networks and specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky (1956), in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagation-through-time training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[3]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[4]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[5]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[6]  David Pesetsky,et al.  Paths and categories , 1982 .

[7]  Noam Chomsky Knowledge of Language , 1986 .

[8]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[9]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[11]  Patrice Y. Simard,et al.  Analysis of Recurrent Backpropagation , 1988 .

[12]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[13]  M. W. Shields An Introduction to Automata Theory , 1988 .

[14]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[15]  Juan Uriagereka,et al.  A Course in GB Syntax: Lectures on Binding and Empty Categories , 1988 .

[16]  R. Taraban,et al.  Language learning: Cues or rules? , 1989 .

[17]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[18]  Garrison W. Cottrell,et al.  A Connectionist Perspective on Prosodic Structure , 1989 .

[19]  David S. Touretzky Rules and Maps in Connectionist Symbol Processing , 1989 .

[20]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[21]  L. Ingber Very fast simulated re-annealing , 1989 .

[22]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[23]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[24]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[25]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[26]  Andreas Stolcke Learning Feature-based Semantics with Simple Recurrent Networks , 1990 .

[27]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[28]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[29]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[30]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[31]  Mary Hare,et al.  The Role of Similarity in Hungarian Vowel Harmony: a Connectionist Account , 1990 .

[32]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[33]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[34]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[35]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[36]  Geoffrey E. Hinton Learning and Applying Contextual Constraints in Sentence Comprehension , 1991 .

[37]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[38]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[39]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[40]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[41]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[42]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[43]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[44]  L. Ingber Adaptive Simulated Annealing (ASA) , 1993 .

[45]  Etienne Barnard,et al.  Backpropagation uses prior information efficiently , 1993, IEEE Trans. Neural Networks.

[46]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[47]  X. LingCharles Learning the past tense of English verbs , 1994 .

[48]  Mike Alder,et al.  Natural Language Grammatical Inference , 1994 .

[49]  Ah Chung Tsoi,et al.  Locally recurrent globally feedforward networks: a critical review of architectures , 1994, IEEE Trans. Neural Networks.

[50]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[51]  Franz J. Kurfess,et al.  Connectionist Symbol Processing , 1994 .

[52]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[53]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[54]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[55]  Sandiway Fong,et al.  Natural language grammatical inference: a comparison of recurrent neural networks and machine learning methods , 1995, Learning for Natural Language Processing.

[56]  H T Siegelmann,et al.  Dating and Context of Three Middle Stone Age Sites with Bone Points in the Upper Semliki Valley, Zaire , 2007 .

[57]  Giovanni Soda,et al.  Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..

[58]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[59]  Paolo Frasconi,et al.  Computational capabilities of local-feedback recurrent networks acting as finite-state machines , 1996, IEEE Trans. Neural Networks.

[60]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[61]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[62]  C. Lee Giles,et al.  Rule Revision With Recurrent Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[63]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[64]  Hava T. Siegelmann,et al.  Computational capabilities of recurrent NARX neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[65]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[66]  Scott Kirkpatrick,et al.  Simulated annealing , 1998 .

[67]  M. Inés Torres,et al.  Pattern recognition and applications , 2000 .