Towards using prosody to scaffold lexical meaning in robots

We present a case-study analysing the prosodic contours and salient word markers of a small corpus of robot-directed speech where the human participants had been asked to talk to a socially interactive robot as if it were a child. We assess whether such contours and salience characteristics could be used to extract relevant information for the subsequent learning and scaffolding of meaning in robots. The study uses measures of pitch, energy and word duration from the participants speech and exploits Pierrehumbert and Hirschberg's theory of the meaning of intonational contours which may provide information on shared belief between speaker and listener. The results indicate that 1) participants use a high number of contours which provide new information markers to the robot, 2) that prosodic question contours reduce as the interactions proceed and 3) that pitch, energy and duration features can provide strong markers for relevant words and 4) there was little evidence that participants altered their prosodic contours in recognition of shared belief. A description and verification of our software which allows the semi-automatic marking of prosodic phrases is also described.

[1]  Katharina J. Rohlfing,et al.  Attention via Synchrony: Making Use of Multimodal Cues in Social Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[2]  Matthew Saxton,et al.  The Inevitability of Child Directed Speech , 2009 .

[3]  K. Rohlfing,et al.  On the loop of action modification and the recipient's gaze in adult-child interaction , 2009 .

[4]  Chrystopher L. Nehaniv,et al.  Robot learning of lexical semantics from sensorimotor interaction and the unrestricted speech of human tutors , 2010 .

[5]  Erik D. Thiessen,et al.  Infant-Directed Speech Facilitates Word Segmentation. , 2005, Infancy : the official journal of the International Society on Infant Studies.

[6]  G. Ayers,et al.  Guidelines for ToBI labelling , 1994 .

[7]  Anne Fernald,et al.  Prosody and focus in speech to infants and adults , 1991 .

[8]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[9]  Patricia Zukow-Goldring,et al.  A social ecological realist approach to the emergence of the lexicon: Educating attention to amodal invariants in gesture and speech. , 1997 .

[10]  RobinsBen,et al.  KASPAR --a minimally expressive humanoid robot for human--robot interaction research , 2009 .

[11]  S. L. Ornat,et al.  Language Acquisition and Development , 2012 .

[12]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[13]  Mary R. Newsome,et al.  The Beginnings of Word Segmentation in English-Learning Infants , 1999, Cognitive Psychology.

[14]  C. Moore,et al.  Differences in How 12- and 24-Month-Olds Interpret the Gaze of Adults. , 2007, Infancy : the official journal of the International Society on Infant Studies.

[15]  M. Saxton Child Language: Acquisition and Development , 2010 .

[16]  Amanda C. Brandone,et al.  Speaking for the wordless: Methods for studying the foundations of cognitive linguistics in infants , 2007 .

[17]  Mary P. Harper,et al.  An Open Source Prosodic Feature Extraction Tool , 2006, LREC.

[18]  Chrystopher L. Nehaniv,et al.  Title of paper : KASPAR – A Minimally Expressive Humanoid Robot for Human-Robot Interaction Research , 2009 .

[19]  Eve V. Clark First Language Acquisition: Two languages at a time , 2009 .

[20]  Eve V. Clark,et al.  Adult offer, word-class, and child uptake in early lexical acquisition , 2010 .

[21]  Stefan Howorka,et al.  University College, London , 1964, Nature.

[22]  B. Scassellati,et al.  What prosody tells infants to believe , 2008, 2008 7th IEEE International Conference on Development and Learning.

[23]  William P. Fifer,et al.  Two-day-olds prefer their native language , 1993 .

[24]  J. Sachs,et al.  Language learning with restricted input: Case studies of two hearing children of deaf parents , 1981, Applied Psycholinguistics.

[25]  Morten H. Christiansen,et al.  Words in puddles of sound: modelling psycholinguistic effects in speech segmentation. , 2010, Journal of child language.

[26]  Giulio Sandini,et al.  RobotCub: an open framework for research in embodied cognition , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[27]  Lakshmi J. Gogate,et al.  Type of Maternal Object Motion During Synchronous Naming Predicts Preverbal Infants' Learning of Word-Object Relations. , 2008, Infancy : the official journal of the International Society on Infant Studies.