Predicting the birth of a spoken word

Significance The emergence of productive language is a critical milestone in a child’s life. Laboratory studies have identified many individual factors that contribute to word learning, and larger scale studies show correlations between aspects of the home environment and language outcomes. To date, no study has compared across many factors involved in word learning. We introduce a new ultradense set of recordings that capture a single child’s daily experience during the emergence of language. We show that words used in distinctive spatial, temporal, and linguistic contexts are produced earlier, suggesting they are easier to learn. These findings support the importance of multimodal context in word learning for one child and provide new methods for quantifying the quality of children’s language input. Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child’s life that allows us to measure the child’s experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child’s production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children’s language acquisition.

[1]  R. McKevitt Talking with children. , 1970, Journal of the New York State School Nurse-Teachers Association.

[2]  H. Storkel,et al.  Learning new words: phonotactic probability in language development. , 2001, Journal of speech, language, and hearing research : JSLHR.

[3]  Anne Fernald,et al.  Talking to Children Matters , 2013, Psychological science.

[4]  Deb Roy,et al.  The birth of a word , 2013 .

[5]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[6]  H. Quastler Information theory in psychology : problems and methods , 1955 .

[7]  Rony Kubat,et al.  Totalrecall: visualization and semi-automatic annotation of very large audio-visual corpora , 2007, ICMI '07.

[8]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[9]  Soroush Vosoughi,et al.  Interactions of caregiver speech and early word learning in the Speechome corpus : computational explorations , 2010 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[11]  Nilam Ram,et al.  Studying Intraindividual Variability: What We Have Learned That Will Help Us Understand Lives in Context , 2004 .

[12]  Ping Li,et al.  Does frequency count? Parental input and the acquisition of vocabulary , 2008, Journal of Child Language.

[13]  P. Bloom How children learn the meanings of words , 2000 .

[14]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[15]  Michael Tomasello,et al.  The Role of Discourse Novelty in Early Word Learning , 1996 .

[16]  Rebecca J. Panagos Meaningful Differences in the Everyday Experience of Young American Children , 1998 .

[17]  S. Suter Meaningful differences in the everyday experience of young American children , 2005, European Journal of Pediatrics.

[18]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[19]  E. Dromi Early Lexical Development , 1987 .

[20]  Bob McMurray,et al.  Defusing the Childhood Vocabulary Explosion , 2007, Science.

[21]  J. Colombo,et al.  The nature and processes of preverbal learning: implications from nine-month-old infants' discrimination problem solving. , 1994, Monographs of the Society for Research in Child Development.

[22]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[23]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[24]  J. Reznick,et al.  Developmental and stylistic variation in the composition of early vocabulary , 1994, Journal of Child Language.

[25]  Linda B. Smith,et al.  Rapid Word Learning Under Uncertainty via Cross-Situational Statistics , 2007, Psychological science.

[26]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[27]  Hermann Ebbinghaus,et al.  Memory: a contribution to experimental psychology. , 1987, Annals of neurosciences.

[28]  A. Gopnik,et al.  The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. , 1987 .

[29]  Dare A. Baldwin,et al.  Infants' contribution to the achievement of joint reference. , 1991, Child development.

[30]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[31]  Linda B. Smith,et al.  Object name Learning Provides On-the-Job Training for Attention , 2002, Psychological science.

[32]  M. Friend,et al.  Reliability and validity of the Computerized Comprehension Task (CCT): data from American English and Mexican Spanish infants* , 2008, Journal of Child Language.

[33]  E. Bates,et al.  A comparison of the transition from first words to grammar in English and Italian , 1999, Journal of Child Language.

[34]  Helen Goodluck,et al.  First language acquisition. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[35]  M. Tomasello,et al.  Variability in early communicative development. , 1994, Monographs of the Society for Research in Child Development.

[36]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[37]  D K Oller,et al.  Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development , 2010, Proceedings of the National Academy of Sciences.

[38]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[39]  M. Tomasello,et al.  Social cognition, joint attention, and communicative competence from 9 to 15 months of age. , 1998, Monographs of the Society for Research in Child Development.

[40]  J. Piaget,et al.  The Child's Conception of the World , 1971 .

[41]  H. Gleitman,et al.  Mother, Id rather do it myself: Some effects and non-effects of maternal speech style , 1977 .

[42]  Deb Roy,et al.  Fast transcription of unstructured audio recordings , 2009, INTERSPEECH.

[43]  C. A. Ferguson,et al.  Talking to Children , 1977 .

[44]  J. Bruner Child's Talk: Learning to Use Language , 1985 .

[45]  Susan Goldin-Meadow,et al.  Quality of early parent input predicts child vocabulary 3 years later , 2013, Proceedings of the National Academy of Sciences.

[46]  M. Brent,et al.  The role of exposure to isolated words in early vocabulary development , 2001, Cognition.

[47]  Ellen M. Markman,et al.  Categorization and Naming in Children: Problems of Induction , 1989 .

[48]  L. Gleitman The Structural Sources of Verb Meanings , 2020, Sentence First, Arguments Afterward.

[49]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[50]  A. Bryk,et al.  Early vocabulary growth: Relation to language input and gender. , 1991 .

[51]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[52]  J. Ludwig,et al.  The Endogeneity Problem in Developmental Studies , 2004 .

[53]  C. A. Ferguson,et al.  Talking to Children: Language Input and Acquisition , 1979 .

[54]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[55]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .