Learning from a tutor: Embodied speech acquisition and imitation learning

This work presents a new developmentally inspired data-driven framework to bootstrap speech perception and imitation abilities in interaction with a tutor. The proposed system architecture extends our work presented in [1], that implements a cascade of interconnected layers to acquire the structure of speech in terms of phones, syllables and words. Here, we show how to couple such a perceptual model with a speech imitation system that is based on an acoustic synthesizer bound to produce speech sounds with a child's voice.

[1]  Naoto Iwahashi,et al.  Robots That Learn Language: Developmental Approach to Human-Machine Conversations , 2006, EELC.

[2]  J. Perkell,et al.  A Neural Model of Speech Production and Its Application to Studies of the Role of Auditory Feedback in Speech , 2003 .

[3]  Inna Mikhailova,et al.  Expectation-driven autonomous learning and interaction system , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[4]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[5]  Peter W. Jusczyk,et al.  How infants begin to extract words from speech , 1999, Trends in Cognitive Sciences.

[6]  Bernd J. Kröger,et al.  Towards a neurocomputational model of speech production and perception , 2009, Speech Commun..

[7]  A. Meltzoff,et al.  Infant vocalizations in response to speech: vocal imitation and developmental change. , 1996, The Journal of the Acoustical Society of America.

[8]  Ian S. Howard,et al.  A Computational Model of Infant Speech Development , 2007 .

[9]  1A2-L07 Finding the Correspondence of Caregiver's Vowel Categories Based on Unconsious Anchoring in Maternal Imitation , 2007 .

[10]  Miguel Vaz,et al.  Speech imitation with a child’s voice: addressing the correspondence problem , 2009 .

[11]  G. Westermann,et al.  A new model of sensorimotor coupling in the development of speech , 2004, Brain and Language.

[12]  B. Wrede,et al.  A self-referential childlike model to acquire phones, syllables and words from acoustic speech , 2008, 2008 7th IEEE International Conference on Development and Learning.

[13]  Brian Scassellati,et al.  Audio Speech Segmentation Without Language-Specific Knowledge , 2006 .

[14]  E. Newport,et al.  Computation of Conditional Probability Statistics by 8-Month-Old Infants , 1998 .