Characterizing Motherese: On the Computational Structure of Child-Directed Language

Characterizing Motherese: On the Computational Structure of Child-Directed Language Heidi Waterfall (he32@cornell.edu) 1 Peter Brodsky (pb86@cornell.edu) Shimon Edelman (se37@cornell.edu) Department of Psychology, Cornell University Ithaca, NY 14853 USA Abstract partial self-repetitions — variation sets — when speak- ing to young children acquiring language (Furrow, Nel- son, & Benedict, 1979; Kavanaugh & Jirovsky, 1982; Kaye, 1980; Snow, 1972; Hoff-Ginsberg, 1985, 1986, 1990; K¨ untay & Slobin, 1996; Waterfall, 2006). We report a quantitative analysis of the cross-utterance coordination observed in child-directed language, where successive utterances often overlap in a manner that makes their constituent structure more prominent, and describe the application of a recently published unsuper- vised algorithm for grammar induction to the largest available corpus of such language, producing a gram- mar capable of accepting and generating novel well- formed sentences. We also introduce a new corpus-based method for assessing the precision and recall of an auto- matically acquired generative grammar without recourse to human judgment. The present work sets the stage for the eventual development of more powerful unsuper- vised algorithms for language acquisition, which would make use of the coordination structures present in nat- ural child-directed speech. Keywords: Language acquisition; grammar inference; computational linguistics. Variation sets Hoff-Ginsberg (1985) conducted one of the initial ex- aminations of the effect of maternal self-repetitions on children’s progress in language acquisition. She showed that alternations in maternal self-repetitions that con- formed to major constituent boundaries were related to growth in children’s verb use, while those repeti- tions that altered material within a phrasal constituent aided in noun-phrase growth. In a subsequent study, Hoff-Ginsberg (1986) found that the frequency of self- repetitions and expansions was positively correlated with child verb phrase development. Similarly, Hoff-Ginsberg (1990) confirmed that maternal self-repetitions and ex- pansions were positively correlated with the average number of verbs per utterance in child speech. Hoff-Ginsberg’s analyses, however, concentrated on the corpus as a whole and did not examine the contin- gent nature of clusters of such repetitions. K¨ untay and Slobin (1996) pioneered the research into variation sets, conducting the first longitudinal study specifically ana- lyzing the effect of local clusters of partial repetitions in child-directed speech on language development. Focus- ing on the acquisition of Turkish, they found that vari- ation sets made up approximately 20% of child-directed speech. The use of variation sets was positively associ- ated with children’s acquisition of specific verbs. In sum, variation sets seem to be ideal environments for learning lexical items and constituent structures. By holding most of the utterance constant, while altering it slightly (see Table 1 for an example), parents may allow children to discover lexical items, syntactic constituents, and their place in the syntax, vis-`a-vis comparison and contrast, as envisaged (in the context of the discovery of grammar by linguists) by Zellig Harris (1946). Waterfall (2006) conducted the first longitudinal study of variation sets in English. We briefly mention here some of her findings (Waterfall, 2007). The participants were twelve parent-child dyads (ages 14-30 months). The subjects were balanced for child gender, child birth or- Introduction Does child-directed speech — what Newport, Gleitman, and Gleitman (1977) called “Motherese” — possess spe- cial characteristics that make it easier to learn from? In this paper, we present two kinds of corpus-based ev- idence that should be useful in addressing this ques- tion. First, we report a quantitative analysis of the cross-utterance coordination observed in child-directed language, where successive utterances often overlap in a manner that makes their constituent structure more prominent. Second, we describe the application of a re- cently published unsupervised algorithm for grammar in- duction to the largest available corpus of child-directed language, and the performance of the resulting grammar in accepting and generating novel well-formed sentences. This work sets the stage for the development of more powerful unsupervised algorithms for language acquisi- tion, which would make use of the coordinated structures present in natural child-directed speech. Cross-utterance coordination in Motherese There is a great deal of evidence suggesting that par- ents produce structured dialogues when talking with very young children. Parents’ speech to young chil- dren is highly repetitive and often includes clusters of Also with the Department of Psychology, University of Chicago, Chicago, IL 60637 USA.

[1]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[2]  E. Hoff-Ginsberg Maternal speech and the child's development of syntax: a further look , 1990, Journal of Child Language.

[3]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[4]  Z. Harris From Morpheme to Utterance , 1946 .

[5]  Barbara C. Scholz,et al.  Empirical assessment of stimulus poverty arguments , 2002 .

[6]  E. Hoff-Ginsberg,et al.  Function and structure in maternal speech: Their relation to the child's development of syntax. , 1986 .

[7]  H. Gleitman,et al.  Mother, Id rather do it myself: Some effects and non-effects of maternal speech style , 1977 .

[8]  E. Hoff-Ginsberg,et al.  Some contributions of mothers' speech to their children's syntactic growth , 1985, Journal of Child Language.

[9]  C. Snow Mothers' Speech to Children Learning Language. , 1972 .

[10]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[11]  K. Kaye,et al.  Why we don't talk ‘baby talk’ to babies , 1980, Journal of Child Language.

[12]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[13]  K. Nelson,et al.  Mothers' speech to children and syntactic development: some simple relationships , 1979, Journal of Child Language.

[14]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[15]  R. D. Kavanaugh,et al.  Parental Speech to Young Children: A Longitudinal Analysis. , 1982 .