Exploring the Role of Representation in Models of Grammatical Category Acquisition

Exploring the Role of Representation in Models of Grammatical Category Acquisition Ting Qian (tqian@bcs.rochester.edu) 1 Patricia A. Reeder (preeder@bcs.rochester.edu) 1 Richard N. Aslin (aslin@cvs.rochester.edu) 1 Josh B. Tenenbaum (jbt@mit.edu) 2 Elissa L. Newport (newport@bcs.rochester.edu) 1 1 Department of Brain & Cognitive Sciences, University of Rochester of Brain & Cognitive Sciences, MIT 2 Department Abstract tween linguistic elements. Although studies have shown that human learners and computational models can successfully learn grammatical categories when only these cues are avail- able, the question of representation still remains poorly un- derstood. How do learners represent the knowledge of pre- viously encountered linguistic items in order to generalize to novel ones? The aim of the present work is to ask what types of repre- sentations are used by human learners in an artificial grammar learning (AGL) task that includes many of the distributional properties of spoken language. We focus on how learners in- duce grammatical categories and assign words to them. Our approach involves computational modeling, comparing the simulated learning outcome of three different models, each of which makes a different assumption about how learners rep- resent the learned grammar. We assess the models by com- paring the generalization patterns of each model and those of human learners. Our experimental data come from our pre- vious findings across 10 AGL experiments (Reeder et al., in review; Schuler et al., in prep). In the next section, we first provide a brief summary of these results. Importantly, the goal of our modeling work is not to mirror every detail of hu- man behavior in AGL experiments: to do so, one must con- sider psychological variables such as memory and attention, which are currently not included in our models. Instead, we are interested in exploring the representational assumptions that human learners have adopted in our experiments. One major aspect of successful language acquisition is the abil- ity to generalize from properties of experienced items to novel items. We present a computational study of artificial language learning, where the generalization patterns of three generative models are compared to those of human learners across 10 ex- periments. Results suggest that an explicit representation of word categories is the best model for capturing the generaliza- tion patterns of human learners across a wide range of learning environments. We discuss the representational assumptions implied by these models. Introduction Learning the grammar of a language consists of at least two important tasks. First, learners must discover the cues in the linguistic input that are useful for constructing the grammar of the language. Second, learners must represent their knowl- edge of the grammar in a form that makes it possible to assess the grammaticality of future input. With an appropriate repre- sentation of the grammar, learners can generalize from prop- erties of the small set of experienced items to predicted prop- erties of novel items. This ability for generalization is crucial for language acquisition, as the input for learning is naturally limited. Such generalization should extend to only the novel items that are actually licensed by the language, no more (over-generalization) and no less (under-generalization). Previous research has offered several hypotheses regarding the cues that learners use and the representations of gram- mar they form. In the realm of syntactic category acquisition, one hypothesis is that the categories (but not their contents) are innately specified prior to receiving any linguistic input, with the assignment of words to categories accomplished with minimal exposure (e.g. McNeill, 1966). On this view, both the cues and the representations are predefined and indepen- dent of linguistic input. A contrasting view states that gram- matical categories are learned, though different hypotheses appeal to the importance of different cues or cue combina- tions during the learning process (such as semantic cues, e.g., Bowerman, 1973). Within this class of non-nativist hypothe- ses, several studies have suggested that distributional cues may be sufficient for extracting the grammar of the input language (e.g., Braine, 1987; Maratsos & Chalkley, 1980; Mintz et al., 2002). Distributional cues are defined over pat- terns in the linguistic input, such as token frequencies, co- occurrence statistics, and latent structural dependencies be- Background on Behavioral Results The behavioral data come from a series of 10 experiments with adult participants in which we created an artificial gram- mar with the structure (Q)AXB(R). Each letter represents a category of nonsense words. Q and R words served as op- tional categories that made sentences of the language vary in length from 3 to 5 words and made words of the language observe patterning in terms of relative order but not fixed po- sition. The sizes of the categories varied across experiments, leading to different numbers of possible sentences in the lan- guage. For ease of presentation, we will number the experi- ments. In Experiments 1-4 (Reeder et al., 2009), there were 108 possible sentences that could be created from this gram- mar; in Experiment 5 (Reeder et al., 2009), there were 576 possible sentences; in Experiments 6-10 (Reeder et al., 2010;