Principles of Generalization for Learning Sequential Structure in Language

Principles of Generalization for Learning Sequential Structure in Language Michael C. Frank, Denise Ichinco, and Joshua B. Tenenbaum {mcfrank, ithink, jbt}@mit.edu Department of Brain and Cognitive Sciences, 43 Vassar Street Cambridge, MA 02139 USA Abstract Proponents of analogical or associative theories have em- phasized the parsimony and neural plausibility of this type of proposal. In contrast, dual-route theorists have focused on representational or expressive limitations of the analogical approach. There are two dissociable issues captured by this debate: (1) the number of routes for morphology learning and (2) the algorithmic form and expressive power of those routes. For instance, recent work by Albright & Hayes (2003) com- pared an analogical model with a rule-based model and found that the greater expressivity of the rule-based model allowed for tighter generalization and better fit to human experimental data in a novel-word inflection wug task, despite the fact that both models had only one route for representation. Under a more general definition of a rule as a systematic regularity, rules can be both broad (as in the regular rule for the past tense in English orthography, +ed, and narrow (as in the past tense rule for the verb go: go → went. Within an expressive enough hypothesis space, a rule could even be for- mulated for analogical inferences like using inflections from stems with high similarity. 1 If we assume that the hypothe- sis space of rules is broad enough to capture many different types of regularities, the problem of how to find the right rule within this hypothesis space becomes more important. Our current work is not directly concerned with the exact form of the representations used by human learners. Instead, we assume that learners are attempting to make generaliza- tions from limited data within some hypothesis space and fo- cus on the principles by which they find the best generaliza- tions in that space. Following Albright & Hayes (2003), our hypothesis space consists of sets of explicit rules, both for their ease of interpretation and because Albright and Hayes’ data show that this kind of representation provides a better fit to human generalizations. However, we take rules to be a representational convenience which we adopt at the high- est of Marr’s (1982) levels of analysis: the level of compu- tational theory. Thus, we focus here not on testing different kinds of representations, but instead on making explicit and individually testing the principles of generalization by which particular rules are learned. How do learners discover patterns in the sequential structure of their language? Infants and adults have surprising abilities to learn structure in simple artificial languages, but the mecha- nisms are unknown. Here we introduce a rule-based Bayesian model incorporating two principles: minimal generalization and representational parsimony. We apply our model to tasks in artificial language learning and inflectional morphology and show that it fits behavioral results from infants and adults and learns inflectional rules from natural data. Keywords: Language acquisition; generalization; artificial language learning; inflectional morphology; Bayesian model- ing. Introduction How do learners discover patterns in the sequential struc- ture of their language? Experimental work on the unsuper- vised learning of sequential structure has suggested that in- fants and adults have access to flexible and powerful learning mechanisms which may be involved in language acquisition (Gomez, 2002; Marcus et al., 1999). However, both the par- ticular mechanisms involved in these tasks and the aspects of acquisition to which they apply are at present unknown. In our current work we attempt to address these questions by creating a computational model which embodies two prin- ciples suggested by this experimental literature: minimal gen- eralization and representational parsimony. We show that these principles apply not only to artificial language tasks, but that they may also have applications to learning inflectional morphology, an important task facing language learners. We first describe our model and how it embodies a trade- off between these two principles within a hypothesis space expressive enough to capture many different types of rules. We next show how our model can be applied to artificial lan- guage experiments on learning identity-rules (Gerken, 2006; Marcus et al., 1999) and non-adjacent dependencies (Gomez, 2002). We then present an extension of our model to the case of inflectional morphology. Finally, we show prelimi- nary data indicating that our model can be applied directly to learning inflectional rules in natural language. The representations and learning mechanisms involved in the acquisition of inflectional morphology have been hotly debated in the literature on language acquisition. Two ba- sic positions have been proposed: a single process of ana- logical learning (Rumelhart & McClelland, 1986) or a dual system consisting of both abstract rules and associative pro- cesses (Pinker, 1991). While this debate has been taken as representative of a wider debate over the format of mental representation, it has nevertheless tended to confound a num- ber of independent computational issues. Model Design We formalize the idea of a rule as a set of restrictions on the features of a string. For instance, Marcus et al. (1999) presented infants with strings like wo f e f e (three-syllable strings where the last two syllables were the same). In our 1 For instance, though the hypothesis space of our current model does not allow similarity-based rules (e.g., ”strings within some edit distance of X”), it would be relatively simple to add such rules.

[1]  D. Luce,et al.  Object Detection and Recognition , 2009, Encyclopedia of Database Systems.

[2]  Peter M. Vishton,et al.  Rule learning by seven-month-old infants. , 1999, Science.

[3]  R. Gómez Variability and Detection of Invariant Structure , 2002, Psychological science.

[4]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[5]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[6]  B. Hayes,et al.  Rules vs. analogy in English past tenses: a computational/experimental study , 2003, Cognition.

[7]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[8]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[9]  B. Schölkopf,et al.  Edinburgh Research Explorer Interpolating between types and tokens by estimating power-law generators , 2006 .

[10]  LouAnn Gerken,et al.  Decisions, decisions: infant language learning when multiple generalizations are possible , 2006, Cognition.

[11]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[12]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[13]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[14]  S Pinker,et al.  Rules of language. , 1991, Science.