Infinite RAAM: A Principled Connectionist Basis for Grammatical Competence

Infinite RAAM: A Principled Connectionist Basis for Grammatical Competence Simon Levy, Ofer Melnik and Jordan Pollack levy, melnik, pollack@cs.brandeis.edu Dynamical and Evolutionary Machine Organization Volen Center for Complex Systems, Brandeis University, Waltham, MA 02454, USA February 6, 2000 Abstract This paper presents Infinite RAAM (IRAAM), a new fusion of recurrent neural networks with fractal geometry, allowing us to understand the behavior of these networks as dynamical sys- tems. Our recent work with IRAAMs has shown that they are capable of generating the context-free (non-regular) language for arbitrary values of . This paper expands upon that work, showing that IRAAMs are capable of generating syntac- tically ambiguous languages but seem less capable of gener- ating certain context-free constructions that are absent or dis- favored in natural languages. Together, these demonstrations support our belief that IRAAMs can provide an explanatorily adequate connectionist model of grammatical competence in natural language. Natural Language Issues In an early and extremely influential paper, Noam Chomsky (1956) showed that natural languages (NL’s) cannot be mod- eled by a finite-state automaton, because of the existence of center-embedded constructions. A second and equally im- portant observation from this work was that a minimally ade- quate NL grammar must be ambiguous, assigning more than one structure (interpretation) to some sentences, for example, They are flying planes. The first observation led to the development of Chomsky’s formal hierarchy of languages, based on the computational resources of the machines needed to recognize them. In this hierarchy, Chomsky’s observation about center-embedding is expressed by saying that NL’s are non-regular; i.e., they can- not be generated by having only rules of the form a grammar , where and are non-terminal symbols and is a terminal symbol. Whether NL’s are merely non-regular, belonging in the next, context-free (CF) level of the Chomsky hierarchy, or are more powerful, belonging further up in the hierarchy, became the subject of heated debate (Higginbotham 1984; Postal and Langendoen 1984; Shieber 1985). Non-CF phenomena such as reduplication/copying (Culy 1985) and crossed serial de- pendencies (Bresnan, Kaplan, Peters, and Zaenen 1982) sug- gested that a more powerful approach, using syntactic trans- formations (Chomsky 1957) was called for, but some re- searchers criticized transformations as having arbitrary power and thus failing to constrain the types of languages that could be expressed (Gazdar 1982). Further criticism of the entire formal approach came from observing that even CF gram- mars (CFGs) had the power to generate structures, such as a sequence followed by its mirror image, that did not seem to occur in NL (Manaster-Ramer 1986), or which placed an extraordinary burden on the human parsing mechanism when they did occur (Bach, Brown, and Marslen-Wilson 1986). Connectionism and Natural Language While debates about the complexity of NL were raging, connectionism was beginning to awaken from a fifteen-year sleep. In connectionist models many researchers found a way of embodying flexibility, graceful degradation, and other non-rigid properties that seem to characterize real cognitive systems like NL. This research culminated the publication of a highly controversial paper by Rumelhart and McClel- land (1986) which provided a connectionist account of part of the grammar of English using a feed-forward neural net- work. The paper was soon criticized by more traditional cog- nitive scientists (Fodor and Pylyshyn 1988; Pinker and Prince 1988), who cited the non-generative nature of such connec- tionist models as a fundamental shortcoming of the entire field. Partly in response to these criticisms, many connection- ists have spent the past decade investigating network models which support generativity through recurrent (feedback) con- nections (Lawrence, Giles, and Fong 1998; Rodriguez, Wiles, and Elman 1999; Williams and Zipser 1989). The research we present here is an attempt to contribute to this effort while focusing as strongly as possible on the natural language is- sues described above. Such an attempt faces a number of challenges. First, despite analysis of how a network’s dynamics con- tribute to its generativity, it is often uncertain whether the dynamics can support generation of well-formed strings be- yond a certain length. That is, it is unknown whether the net- work has a true “competence” for the language of which it has learned a few exemplars, or is merely capable of generating a finite, and hence regular, subset of the language. 1 Second, it is often easier to model weak, rather than strong genera- tive capacity, by building networks that generate or recognize strings having certain properties, without assigning any syn- tactic structure to the strings. Third, this lack of syntactic structure inhibits the formulation of an account of syntactic ambiguity in such networks, making them less plausible as models of NL. To be fair, not all connectionists, or cognitive scientists, take seriously the notion that human language has infinite generative ca- pacity. Though we obviously do not have the resources to argue the issue here, we are certain that a model with a provably infinite competence would be more persuasive to the cognitive science com- munity as a whole than would a model without one.

[1]  Jordan B. Pollack,et al.  Fractal (Reconstructive Analogue) Memory , 1992 .

[2]  W. Tabor Dynamical Automata , 1998 .

[3]  Tim van Gelder,et al.  Compositionality: A Connectionist Variation on a Classical Theme , 1990, Cogn. Sci..

[4]  William D. Marslen-Wilson,et al.  Crossed and nested dependencies in German and Dutch , 1986 .

[5]  C. Culy The complexity of the vocabulary of Bambara , 1985 .

[6]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[7]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[8]  Gerald Gazdar,et al.  Phrase Structure Grammar , 1982 .

[9]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[10]  Mark Steedman Tutorial overviewCategorial grammar , 1993 .

[11]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[12]  Stanley Peters,et al.  Cross-Serial Dependencies in Dutch , 1982 .

[13]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[14]  Jordan B. Pollack,et al.  A gradient descent method for a neural fractal memory , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[15]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[16]  Barry L. Kalman,et al.  Tail-recursive Distributed Representations and Simple Recurrent Networks , 1995 .

[17]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[18]  J. Higginbotham English is Not a Context-Free Language , 1984 .

[19]  Alexis Manaster-Ramer,et al.  Copying in Natural Languages, Context-Freeness, and Queue Grammars , 1986, ACL.

[20]  John F. Kolen,et al.  Exploring the computational capabilities of recurrent neural networks , 1995 .

[21]  A. Sperduti Labeling Raam , 1994 .

[22]  Paul Rodríguez,et al.  A Recurrent Neural Network that Learns to Count , 1999, Connect. Sci..

[23]  D. Terence Langendoen,et al.  English and the Class of Context-Free Languages , 1984, Comput. Linguistics.

[24]  Douglas S. Blank,et al.  Exploring the Symbolic/Subsymbolic Continuum: A case study of RAAM , 1992 .

[25]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[26]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[27]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[28]  David J. Chalmers,et al.  Syntactic Transformations on Distributed Representations , 1990 .