On the Emergence of Rules in Neural Networks

A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical knowledge to both new symbol vocabularies and new grammars. Analysis of the state-space shows that the network learns generalized abstract structures of the input and is not simply memorizing the input strings. These representations are context sensitive, hierarchical, and based on the state variable of the finite-state machines that the neural network has learned. Generalization to new symbol sets or grammars arises from the spatial nature of the internal representations used by the network, allowing new symbol sets to be encoded close to symbol sets that have already been learned in the hidden unit space of the network. The results are counter to the arguments that learning algorithms based on weight adaptation after each exemplar presentation (such as the long term potentiation found in the mammalian nervous system) cannot in principle extract symbolic knowledge from positive examples as prescribed by prevailing human linguistic theory and evolutionary psychology.

[1]  John R. Vokey,et al.  Abstract analogies and abstracted grammars: Comments on Reber and Mathews et al. , 1991 .

[2]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[3]  Paul Thagard,et al.  Coherence as Constraint Satisfaction , 2019, Cogn. Sci..

[4]  Gerry Altmann,et al.  Mapping across Domains Without Feedback: A Neural Network Model of Transfer of Implicit Knowledge , 1999, Cogn. Sci..

[5]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[6]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[7]  Peter M. Vishton,et al.  Rule learning by seven-month-old infants. , 1999, Science.

[8]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[9]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[10]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[11]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[12]  J. Berko The Child's Learning of English Morphology , 1958 .

[13]  N. Chater,et al.  Transfer in artificial grammar learning : A reevaluation , 1996 .

[14]  Axel Cleeremans,et al.  Mechanisms of Implicit Learning: Connectionist Models of Sequence Processing , 1993 .

[15]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[16]  A. Reber Implicit learning of artificial grammars , 1967 .

[17]  S. Pinker The Language Instinct , 1994 .

[18]  Christian W. Omlin,et al.  A Machine Learning Method for Extracting Symbolic Knowledge from Recurrent Neural Networks , 2004, Neural Computation.

[19]  S Pinker,et al.  Rules of language. , 1991, Science.

[20]  Stephen Jose Hanson,et al.  Grammar Transfer in a Second Order Recurrent Neural Network , 2001, NIPS.

[21]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[23]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[24]  A. Reber Transfer of syntactic structure in synthetic languages. , 1969 .

[25]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[26]  N. Chater,et al.  Computational models and Rethinking innateness , 1999, Journal of Child Language.

[27]  Stephen José Hanson,et al.  What connectionist models learn: Learning and representation in connectionist networks , 1990, Behavioral and Brain Sciences.

[28]  John R. Vokey,et al.  Abstract analogies and abstracted grammars: Comments on Reber (1989) and Mathews et al. (1989). , 1991 .

[29]  S. Pinker How the Mind Works , 1999, Annals of the New York Academy of Sciences.

[30]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[31]  Stephen José Hanson,et al.  A stochastic version of the delta rule , 1990 .

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.