Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant

Words are made up of speech sounds. Almost all accounts of child speech development assume that children learn the pronunciation of first language (L1) speech sounds by imitation, most claiming that the child performs some kind of auditory matching to the elements of ambient speech. However, there is evidence to support an alternative account and we investigate the non-imitative child behavior and well-attested caregiver behavior that this account posits using Elija, a computational model of an infant. Through unsupervised active learning, Elija began by discovering motor patterns, which produced sounds. In separate interaction experiments, native speakers of English, French and German then played the role of his caregiver. In their first interactions with Elija, they were allowed to respond to his sounds if they felt this was natural. We analyzed the interactions through phonemic transcriptions of the caregivers' utterances and found that they interpreted his output within the framework of their native languages. Their form of response was almost always a reformulation of Elija's utterance into well-formed sounds of L1. Elija retained those motor patterns to which a caregiver responded and formed associations between his motor pattern and the response it provoked. Thus in a second phase of interaction, he was able to parse input utterances in terms of the caregiver responses he had heard previously, and respond using his associated motor patterns. This capacity enabled the caregivers to teach Elija to pronounce some simple words in their native languages, by his serial imitation of the words' component speech sounds. Overall, our results demonstrate that the natural responses and behaviors of human subjects to infant-like vocalizations can take a computational model from a biologically plausible initial state through to word pronunciation. This provides support for an alternative to current auditory matching hypotheses for how children learn to pronounce.

[1]  J. L. Gewirtz,et al.  Reinforcement of vocalizations through contingent vocal imitation. , 2011, Journal of applied behavior analysis.

[2]  A. I. Moskowitz The Two-Year-Old Stage in the Acquisition of English Phonology. , 1970 .

[3]  T. Kokkinaki,et al.  Basic aspects of vocal imitation in infant-parent interaction during the first 6 months , 2000 .

[4]  Jochen J. Steil,et al.  Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[5]  I. Howard,et al.  Modeling the development of pronunciation in infant speech acquisition. , 2011, Motor control.

[6]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[7]  G. Westermann,et al.  A new model of sensorimotor coupling in the development of speech , 2004, Brain and Language.

[8]  Kevin G. Munhall,et al.  Children's Development of Self-Regulation in Speech Production , 2012, Current Biology.

[9]  A. Meltzoff Origins of theory of mind, cognition and communication. , 1999, Journal of communication disorders.

[10]  Minoru Asada,et al.  A constructivist approach to infants' vowel acquisition through mother–infant interaction , 2003, Connect. Sci..

[11]  Minoru Asada,et al.  Vowel Acquisition Based on an Auto-Mirroring Bias with a Less Imitative Caregiver , 2012, Adv. Robotics.

[12]  Ian S. Howard,et al.  Speech Development: Toddlers Don't Mind Getting It Wrong , 2012, Current Biology.

[13]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[14]  Okko Johannes Räsänen,et al.  Computational modeling of phonetic and lexical learning in early language acquisition: Existing models and future directions , 2012, Speech Commun..

[15]  Andrey Ronzhin,et al.  Speech and Computer , 2013, Lecture Notes in Computer Science.

[16]  D. Oller The emergence of the speech capacity , 2000 .

[17]  H. Papoušek,et al.  Forms and functions of vocal matching in interactions between mothers and their precanonical infants , 1989 .

[18]  Ian S. Howard,et al.  A Computational Model of Infant Speech Development , 2007 .

[19]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[20]  Marilyn M. Vihman,et al.  Vocal Motor Schemes. , 1987 .

[21]  Andrew P. King,et al.  A role of her own: female cowbirds, Molothrus ater, influence the development and outcome of song learning , 2000, Animal Behaviour.

[22]  K. Markey The sensorimotor foundations of phonology: a computational model of early childhood articulatory and phonetic development , 1995 .

[23]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[24]  P. Messum The role of imitation in learning to pronounce , 2008 .

[25]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[26]  Pierre-Yves Oudeyer,et al.  Self-organization of early vocal development in infants and machines: the role of intrinsic motivation , 2014, Front. Psychol..

[27]  Gert Westermann,et al.  Prespeech motor learning in a neural network using reinforcement , 2013, Neural Networks.

[28]  P. Fikkert,et al.  The acquisition of the stop-fricative contrast in perception and production , 2010 .

[29]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[30]  Frank H. Guenther,et al.  A neural network model of speech acquisition and motor equivalent speech production , 2004, Biological Cybernetics.

[31]  R. H. Stetson Motor phonetics : a study of speech movements in action , 1951 .

[32]  Daniel P. W. Ellis,et al.  Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[33]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[34]  Olle Gunnilstam,et al.  The theory of local linearity , 1974 .

[35]  C. Heyes Where do mirror neurons come from? , 2010, Neuroscience & Biobehavioral Reviews.

[36]  C. Gattegno In the Beginning There Were No Words: The Universe of Babies , 1973 .

[37]  Gérard Bailly,et al.  Learning to speak. Sensori-motor control of speech movements , 1997, Speech Commun..

[38]  D. Winnicott Playing and Reality , 1971 .

[39]  P. Bessière,et al.  Building a talking baby robot A contribution to the study of speech acquisition and evolution , 2005 .

[40]  G. Mazzaglia,et al.  How do mothers signal shared feeling-states to their infants? An investigation of affect attunement and imitation during the first year of life. , 2001, Scandinavian journal of psychology.

[41]  Daniel R. Lametti,et al.  Sensory Preference in Speech Production Revealed by Simultaneous Alteration of Auditory and Somatosensory Feedback , 2012, The Journal of Neuroscience.

[42]  E. Clark,et al.  Adult reformulations of child errors as negative evidence , 2003, Journal of Child Language.

[43]  Edy Veneziano Vocal-verbal interaction and the construction of early lexical knowledge. , 1988 .

[44]  Anne S. Warlaumont,et al.  Salience-based reinforcement of a spiking neural network leads to increased syllable production , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[45]  Vittorio Gallese,et al.  Mirror Neurons and the Evolution of Brain and Language , 2002 .

[46]  P. Kuhl A new view of language acquisition. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Pierre-Yves Oudeyer,et al.  Curiosity-driven phonetic learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[48]  Minoru Asada,et al.  Finding the Correspondence of Caregiver's Vowel Categories Based on Unconscious Anchoring in Maternal Imitation , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[49]  Minoru Asada,et al.  How Caregiver's Anticipation Shapes Infant's Vowel Through Mutual Imitation , 2009, IEEE Transactions on Autonomous Mental Development.

[50]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[51]  M. Vihman Variable paths to early word production , 1993 .

[52]  Michael Studdert-Kennedy Mirror neurons, vocal imitation, and the evolution of particulate speech , 2002 .

[53]  Lise Menn,et al.  Connectionist Modeling and the Microstructure of Phonological Development: A Progress Report , 1993 .

[54]  Ran,et al.  The correspondence problem , 1998 .

[55]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[56]  D. Fry,et al.  The phonemic system in children's speech. , 1968, The British journal of disorders of communication.

[57]  J. Morton,et al.  Developmental Neurocognition: Speech and Face Processing in the First Year of Life , 2008 .

[58]  M. Asada,et al.  Unconscious anchoring in maternal imitation that helps find the correspondence of a caregiver's vowel categories , 2007, Adv. Robotics.

[59]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[60]  S. Maeda An articulatory model of the tongue based on a statistical analysis , 1979 .

[61]  M. Bullowa Before Speech: The Beginning of Interpersonal Communication , 1979 .

[62]  Satrajit S. Ghosh,et al.  Neural modeling and imaging of the cortical interactions underlying syllable production , 2006, Brain and Language.

[63]  Bernd J. Kröger,et al.  Towards a neurocomputational model of speech production and perception , 2009, Speech Commun..

[64]  K. Otomo,et al.  Maternal responses to word approximations in Japanese children's transition to language. , 2001, Journal of child language.

[65]  J. Locke,et al.  Learning to speak , 1993 .

[66]  Scott R. Robinson,et al.  Oxford handbook of developmental behavioral neuroscience , 2009 .