Verbs are LookING good in early language acquisition - eScholarship

Verbs are LookING Good in Language Acquisition Jon A. Willits (wilits@wisc.edu) Department of Psychology, 1202 W. Johnson Street Madison, WI 53706 USA Mark S. Seidenberg (seidenberg@wisc.edu) Department of Psychology, 1202 W. Johnson Street Madison, WI 53706 USA Jenny R. Saffran (saffran@wisc.edu) Department of Psychology, 1202 W. Johnson Street Madison, WI 53706 USA Abstract Statistical learning is an important element of language acquisition. A basic unresolved question is, what are the units over which statistics are calculated? In a corpus study and two infant behavioral experiments, we show that varying the units that are used greatly affects learning. Using words as units, nouns are easier to segment from continuous speech than verbs. However, if a highly frequent morphological element such as ING is also treated as a unit, noun-verb differences disappear, in both corpus analysis and behavioral studies. These results suggest that infants can compute statistics over units other than words and syllables, and theories of statistical learning need better accounts for why some units are tracked and not others. Keywords: Statistical learning; Language acquisition; Word segmentation; Morphology; Distributional statistics What Do Infants Count? Studies of infants and young children have established that statistical learning plays an important role in language acquisition (see Saffran & Sahni, 2007, for a review). A fundamental question for theories of statistical learning is, what units are statistics computed over? Many statistics can be derived from natural languages, a fact that could limit the role of statistical processes in acquisition, as this could make it more difficult for the language learner to figure out what statistics to use. For researchers in the area, the problem is to identify the units that are tracked and to determine why these units are tracked why others are not. The literature on statistical language learning in infants, children, and adults has often focused on transition probabilities within and between words (e.g. the probability one unit will follow another). For example, Saffran, Aslin, and Newport (1996) manipulated transition probabilities between syllables within a word (which were high) compared to probabilities between syllables at word boundaries (which were low). These statistical heterogeneities provided a basis for identifying words in a simple artificial language. Many subsequent studies have focused on transition probabilities in both artificial and natural languages. Research has also begun to look at other types of dependencies, for example between non-adjacent syllables or words. ] All such experiments make assumptions about the units over which infants encode statistics such as frequency and transition probability. Syllables, for example, seem like obvious units given their fundamental role in speech production. Different units may be tracked at different points in development. As the child’s vocabulary develops, so does the possibility of tracking word-level statistics. Moreover, statistical learning may occur at multiple levels of linguistic structure simultaneously. Thus, the question as to which units statistics are computed over is a central one. The answer will affect the extent to which statistical learning is implicated in acquisition. We examined this question in the context of a puzzle in the language learning literature. In an important study, Jusczyk and Aslin (1995) found that 7.5-month-old infants could identify nouns from fluent, continuous speech. They played infants a two-minute corpus of typical, child-directed speech that repetitively used the same two nouns, and found during a test phase that infants discriminated between the nouns that had been played and frequency matched nouns that had not been played. However, in a later study using a similar procedure, Nazzi et al. (2005) found that verbs were not identifiable until between 13.5 and 17.5 months. Thus nouns and verbs appear to differ in ease of learning. This difference could be because of intrinsic differences between nouns and verbs: verbs could be more complex because they encode relations, the relations can involve ea variety of different elements in a sentence, and these relations can be expressed in a number of different syntactic structures (Gentner, 2006). It is also possible that the statistical properties of nouns and verbs differ, such that whereas nouns can be identified based on the immediate contexts in which they occur, verbs cannot. Identifying verbs might then require the use of other information such as syllabic stress which infants master at later ages than they do transition probability (Saffran & Thiessen, 2003). Our study investigated this idea, but with an important twist: the learnability of nouns and verbs from statistical information (the frequencies of words and the immediate lexical contexts in which they occur) crucially depends on assumptions about the units over which the child computes such statistics. In particular, we examined the role of the highly frequent bound morpheme: ING. Typically, ING is