Multimodal Dialogue Management for Multiparty Interaction with Infants

We present dialogue management routines for a system to engage in multiparty agent-infant interaction. The ultimate purpose of this research is to help infants learn a visual sign language by engaging them in naturalistic and socially contingent conversations during an early-life critical period for language development (ages 6 to 12 months) as initiated by an artificial agent. As a first step, we focus on creating and maintaining agent-infant engagement that elicits appropriate and socially contingent responses from the baby. Our system includes two agents, a physical robot and an animated virtual human. The system's multimodal perception includes an eye-tracker (measures attention) and a thermal infrared imaging camera (measures patterns of emotional arousal). A dialogue policy is presented that selects individual actions and planned multiparty sequences based on perceptual inputs about the baby's internal changing states of emotional engagement. The present version of the system was evaluated in interaction with 8 babies. All babies demonstrated spontaneous and sustained engagement with the agents for several minutes, with patterns of conversationally relevant and socially contingent behaviors. We further performed a detailed case-study analysis with annotation of all agent and baby behaviors. Results show that the baby's behaviors were generally relevant to agent conversations and contained direct evidence for socially contingent responses by the baby to specific linguistic samples produced by the avatar. This work demonstrates the potential for language learning from agents in very young babies and has especially broad implications regarding the use of artificial agents with babies who have minimal language exposure in early life.

[1]  Rosalee Wolfe,et al.  Generating Co-occurring Facial Nonmanual Signals in Synthesized American Sign Language , 2013, GRAPP/IVAPP.

[2]  Mohamed Jemni,et al.  A Review on 3D Signing Avatars: Benefits, Uses and Challenges , 2013, Int. J. Multim. Data Eng. Manag..

[3]  Alexis Héloir,et al.  Sign Language Avatars: Animation and Comprehensibility , 2011, IVA.

[4]  Hatice Kose-Bagci,et al.  Evaluation of the Robot Assisted Sign Language Tutoring Using Video-Based Studies , 2012, International Journal of Social Robotics.

[5]  L. Petitto,et al.  Visual Sonority Modulates Infants’ Attraction to Sign Language , 2018, Language learning and development : the official journal of the Society for Language Development.

[6]  Brian Scassellati,et al.  The Physical Presence of a Robot Tutor Increases Cognitive Learning Gains , 2012, CogSci.

[7]  P. Higgins,et al.  Outsiders in a Hearing World , 1979 .

[8]  V. Gallese,et al.  Thermal infrared imaging in psychophysiology: Potentialities and limits , 2014, Psychophysiology.

[9]  Ari Shapiro,et al.  Building a Character Animation System , 2011, MIG.

[10]  Brian Scassellati,et al.  Teaching Language to Deaf Infants with a Robot and a Virtual Human , 2018, CHI.

[11]  N. Cohen,et al.  Cochlear Implants , 2000 .

[12]  William M. Rabinowitz,et al.  Better speech recognition with cochlear implants , 1991, Nature.

[13]  Jerome D. Schein,et al.  THE DEAF POPULATION OF THE UNITED STATES , 1974 .

[14]  David J. Greenberg,et al.  Infant and stranger variables related to stranger anxiety in the first year of life , 1973 .

[15]  L A Petitto,et al.  Visual sign phonology: insights into human reading and language from a natural soundless phonology. , 2016, Wiley interdisciplinary reviews. Cognitive science.

[16]  Ian Marshall,et al.  Development of a legible deaf-signing virtual human , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[17]  J R Saffran,et al.  The acquisition of language by children , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Hatice Kose-Bagci,et al.  Humanoid robot assisted interactive sign language tutoring game , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[19]  Rosalee Wolfe,et al.  Combining emotion and facial nonmanual signals in synthesized american sign language , 2012, ASSETS '12.

[20]  Lui Sha,et al.  The real-time publisher/subscriber inter-process communication model for distributed real-time systems: design and implementation , 1995, Proceedings Real-Time Technology and Applications Symposium.

[21]  N Marangos,et al.  COCHLEAR IMPLANTS , 1976, The Lancet.

[22]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[23]  U. Bellugi,et al.  The acquisition of conditionals in American Sign Language: Grammaticized facial expressions , 1990, Applied Psycholinguistics.

[24]  D. Ostry,et al.  Baby hands that move to the rhythm of language: hearing babies acquiring sign languages babble silently on the hands , 2004, Cognition.

[25]  Bülent Sankur,et al.  SignTutor: An Interactive System for Sign Language Tutoring , 2009, IEEE Multimedia.

[26]  A. Merla,et al.  Mom feels what her child feels: thermal signatures of vicarious autonomic response while watching children in a stressful situation , 2013, Front. Hum. Neurosci..

[27]  L. Petitto,et al.  Babbling in the manual mode: evidence for the ontogeny of language. , 1991, Science.

[28]  T. Kanda,et al.  Can we talk to robots? Ten-month-old infants expected interactive humanoid robots to be talked to by persons , 2005, Cognition.

[29]  Hatice Kose-Bagci,et al.  A New Robotic Platform for Sign Language Tutoring , 2015, International Journal of Social Robotics.

[30]  David Ostry,et al.  Language rhythms in baby hand movements , 2001, Nature.

[31]  Marina Krcmar,et al.  Word Learning in Very Young Children From Infant‐Directed DVDs , 2011 .

[32]  L. Petitto,et al.  The “Perceptual Wedge Hypothesis” as the basis for bilingual babies’ phonetic processing advantage: New insights from fNIRS brain imaging , 2012, Brain and Language.

[33]  P. Kuhl Early language acquisition: cracking the speech code , 2004, Nature Reviews Neuroscience.

[34]  Rebekah A. Richert,et al.  Media as social partners: the social nature of young children's learning from screen media. , 2011, Child development.

[35]  Lynette van Zijl,et al.  The development of a generic signing avatar , 2007 .

[36]  E. Klima The signs of language , 1979 .

[37]  P. Kuhl,et al.  Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Paul Debevec,et al.  The Light Stages and Their Applications to Photoreal Digital Actors , 2012, SIGGRAPH 2012.

[39]  Hatice Kose-Bagci,et al.  Socially Interactive Robotic Platforms as Sign Language Tutors , 2014, Int. J. Humanoid Robotics.

[40]  Rajesh P. N. Rao,et al.  "Social" robots are psychological agents for infants: A test of gaze following , 2010, Neural Networks.

[41]  A. Finn The sensitive period for language acquisition: The role of age related differences in cognitive and neural function , 2010 .

[42]  M. Krcmar,et al.  Can Toddlers Learn Vocabulary from Television? An Experimental Approach , 2007 .

[43]  Arcangelo Merla,et al.  Thermal expression of intersubjectivity offers new possibilities to human–machine and technologically mediated interactions , 2014, Front. Psychol..

[44]  Annamalai Manickavasagan,et al.  Thermal Infrared Imaging , 2014 .

[45]  David Traum,et al.  The Information State Approach to Dialogue Management , 2003 .