Learning Grammatical Constructions

Learning Grammatical Constructions Nancy C. Chang (nchang@icsi.berkeley.edu) International Computer Science Institute 1947 Center Street, Suite 600, Berkeley, CA 94704 USA Tiago V. Maia (tmaia@cse.buffalo.edu) State University of New York at Buffalo 226 Bell Hall, Buffalo, NY 14260-2000 USA Abstract We describe a computational model of the acquisition of early grammatical constructions that exploits two essen- tial features of the human grammar learner: significant prior conceptual and lexical knowledge, and sensitivity to the statistical properties of the input data. Such prin- ciples are shown to be useful and necessary for learn- ing the structured mappings between form and meaning needed to represent phrasal and clausal constructions. We describe an algorithm based on Bayesian model merg- ing that can induce a set of grammatical constructions based on simpler previously learned constructions (in the base case, lexical constructions) in combination with new utterance-situation pairs. The resulting model shows how cognitive and computational considerations can intersect to produce a course of learning consistent with data from studies of child language acquisition. Introduction This paper describes a model of grammar learning in which linguistic representations are grounded both in the conceptual world of the learner and in the statistical prop- erties of the input. Precocity on both fronts has previ- ously been exploited in models of lexical acquisition; we focus here on the shift from single words to word combi- nations and investigate the extent to which larger phrasal and clausal constructions can be learned using principles similar to those employed in word learning. Our model makes strong assumptions about prior knowledge – both ontological and linguistic – on the part of the learner, taking as both inspiration and constraint the course of development observed in crosslinguistic studies of child language acquisition. After describing our assumptions, we address the rep- resentational complexities associated with larger gram- matical constructions. In the framework of Construc- tion Grammar (Goldberg, 1995), these constructions can, like single-word constructions, be viewed as mappings between the two domains of form and meaning, where form typically refers to the speech or text stream and meaning refers to a rich conceptual ontology. In partic- ular, they also involve relations among multiple entities in both form (e.g., multiple words and/or phonological units) and meaning (multiple participants in a scene), as well as mappings across relations in these two domains. We introduce a simple formalism capable of representing such relational constraints. The remainder of the paper casts the learning prob- lem in terms of two interacting processes, construction hypothesis and construction reorganization, and presents an algorithm based on Bayesian model merging (Stolcke, 1994) that attempts to induce the set of constructions that best fits previously seen data and generalizes to new data. We conclude by discussing some of the broader implica- tions of the model for language learning and use. Conceptual and lexical prerequisites Children learning their earliest word combinations bring considerable prior knowledge to the task. Our model of grammar learning makes several assumptions intended to capture this knowledge, falling into two broad cat- egories: representational requirements for ontological knowledge; and the ability to acquire lexical mappings. Infants inhabit a dynamic world of continuous per- cepts, and how they process and represent these fluid sensations remains poorly understood. By the time they are learning grammar, however, they have amassed a substantial repertoire of concepts corresponding to peo- ple, objects, settings and actions (Bloom, 1973; Bloom, 2000). They are also competent event participants who have acquired richly structured knowledge about how different entites can interact (Tomasello, 1992; Slobin, 1985), as well as sophisticated pragmatic skills that al- low them to determine referential intent (Bloom, 2000). Few computational models of word learning have ad- dressed the general problem of how such sensorimotor and social-cultural savvy is acquired. Several models, however, have tackled the simpler problem of how la- bels (either speech or text) become statistically associ- ated with concepts in restricted semantic domains, such as spatial relations (Regier, 1996), objects and attributes (Roy and Pentland, 1998), and actions (Bailey et al., 1997; Siskind, 2000). Such models assume either explic- itly or implicitly that lexical items can be represented as maps (i.e., bidirectional associations) between represen- tations of form and meaning that are acquired on the basis of input associations. 1 Most of these also produce word senses whose meanings exhibit category and similarity 1 Typically, supervised or unsupervised training is used to induce word categories from sensorimotor input, which is de- scribed using continuous or discrete features; models vary in the degree of inductive bias present in the input feature space.

[1]  Lois Bloom,et al.  One Word at a Time: The Use of Single Word Utterances Before Syntax , 1976 .

[2]  R. Langacker Foundations of cognitive grammar , 1983 .

[3]  D. Slobin The Crosslinguistic Study of Language Acquisition , 1987 .

[4]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[5]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[6]  Ronald W. Langacker,et al.  Concept, Image, and Symbol , 1990 .

[7]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[8]  M. Tomasello First Verbs: A Case Study of Early Grammatical Development , 1994 .

[9]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[10]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[11]  J. Siskind A computational study of cross-situational techniques for learning word-to-meaning mappings , 1996, Cognition.

[12]  Jerome A. Feldman,et al.  When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs , 1997 .

[13]  Jerome A. Feldman,et al.  Modeling Embodied Lexical Development , 1997 .

[14]  H. Behrens From construction to construction: Commentary on Michael Tomasello's review essay ’The Return of Constructions‘ on Adele Goldberg's (1995). : ’Constructions: A Construction Grammar Approach to Argument Structure‘. , 1998 .

[15]  M. Gasser,et al.  Babies, Variables, and Relational Correlations , 2000 .

[16]  P. Bloom How children learn the meanings of words , 2000 .

[17]  Jeffrey Mark Siskind,et al.  Visual Event Classification via Force Dynamics , 2000, AAAI/IAAI.