Fusion of children's speech and 2D gestures when conversing with 3D characters

Most existing multi-modal prototypes enabling users to combine 2D gestures and speech input are task-oriented. They help adult users solve particular information tasks often in 2D standard Graphical User Interfaces. This paper describes the NICE Andersen system, which aims at demonstrating multi-modal conversation between humans and embodied historical and literary characters. The target users are 10-18 years old children and teenagers. We discuss issues in 2D gesture recognition and interpretation as well as temporal and semantic dimensions of input fusion, ranging from systems and component design through technical evaluation and user evaluation with two different user groups. We observed that recognition and understanding of spoken deictics were quite robust and that spoken deictics were always used in multimodal input. We identified the causes of the most frequent failures of input fusion and suggest possible improvements for removing these errors. The concluding discussion summarises the knowledge provided by the NICE Andersen system on how children gesture and combine their 2D gestures with speech when conversing with a 3D character, and looks at some of the challenges facing theoretical solutions aimed at supporting unconstrained speech/2D gesture fusion.

[1]  Sharon Oviatt,et al.  Multimodal Interfaces , 2008, Encyclopedia of Multimedia.

[2]  Janet C. Read,et al.  Oops! silly me! errors in a handwriting recognition-based text entry interface for children , 2002, NordiCHI '02.

[3]  Laurent Romary,et al.  Visual Salience and Perceptual Grouping in Multimodal Interactivity , 2001 .

[4]  Michael Johnston,et al.  Multimodal Applications from Mobile to Kiosk , 2004 .

[5]  Anton Nijholt,et al.  A generic architecture and dialogue model for multimodal interaction , 2003 .

[6]  Petra Gieselmann,et al.  Towards multimodal interaction with an intelligent room , 2003, INTERSPEECH.

[7]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[8]  Jean-Claude Martin,et al.  Experimental Evaluation of Bi-directional Multimodal Interaction with Conversational Agents , 2003, INTERACT.

[9]  Stefan Kopp,et al.  A Communicative Mediator in a Virtual Environment: Processing of Multimodal Input and Output , 2001 .

[10]  David R. Traum,et al.  Embodied agents for multi-party dialogue in immersive virtual worlds , 2002, AAMAS '02.

[11]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[12]  Narada D. Warakagoda,et al.  The MUST guide to Paris: Implementation and expert evaluation of a multimodal tourist guide to Paris , 2002 .

[13]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[14]  Jean-Claude Martin,et al.  Children's and adults' multimodal interaction with 2D conversational agents , 2005, CHI EA '05.

[15]  Deb Roy,et al.  Elvis: situated speech and gesture understanding for a robotic chandelier , 2004, ICMI '04.

[16]  OviattSharon,et al.  Toward adaptive conversational interfaces , 2004 .

[17]  Randy J. Pagulayan,et al.  The untapped world of video games , 2004, CHI EA '04.

[19]  Magdalena D. Bugajska,et al.  Building a Multimodal Human-Robot Interface , 2001, IEEE Intell. Syst..

[20]  James C. Lester,et al.  Animated Pedagogical Agents: Face-to-Face Interaction in Interactive Learning Environments , 2000 .

[21]  Niels Ole Bernsen,et al.  Evaluation of spoken multimodal conversation , 2004, ICMI '04.

[22]  Mohammed Yeasin,et al.  Speech-gesture driven multimodal interfaces for crisis management , 2003, Proc. IEEE.

[23]  Kevin Keeker,et al.  What's my method?: a game show on games , 2004, CHI EA '04.

[24]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[25]  In A. H. Fischer Emotional Episodes, Facial Expressions, and Reported Feelings in Human-computer Interactions , 1998 .

[26]  Niels Ole Bernsen,et al.  User evaluation of conversational agent h. c. Andersen , 2005, INTERSPEECH.

[27]  Sharon L. Oviatt,et al.  Toward adaptive conversational interfaces: Modeling speech convergence with animated personas , 2004, TCHI.

[28]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[29]  Katherine Isbister,et al.  The Blind Men and the Elephant Revisited , 2004, From Brows to Trust.

[30]  Justine Cassell,et al.  Virtual peers as partners in storytelling and literacy learning , 2003, J. Comput. Assist. Learn..

[31]  Yorick Wilks,et al.  Multimodal Dialogue Management in the COMIC Project , 2003 .

[32]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[33]  Rainer Stiefelhagen,et al.  Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures , 2004, ICMI '04.

[34]  Daniel M. Johnson,et al.  Effective affective user interface design in games , 2003, Ergonomics.

[35]  Sharon L. Oviatt,et al.  Multimodal integration patterns in children , 2002, INTERSPEECH.

[36]  A. U.S. MULTIMODAL SYSTEMS FOR CHILDREN : BUILDING A PROTOTYPE , 1999 .

[37]  A. D. Milota,et al.  Modality fusion for graphic design applications , 2004, ICMI '04.

[38]  Sharon L. Oviatt,et al.  Toward a theory of organized multimodal integration patterns during human-computer interaction , 2003, ICMI '03.

[39]  Sharon Oviatt,et al.  Designing and evaluating conversational interfaces with animated characters , 2001 .

[40]  Janet C. Read,et al.  Evaluating Interactive Products for and with Children , 2003, INTERACT.

[41]  Stéphanie BUISINE,et al.  Children ’ s Gesture and Speech in Conversation with 3 D Characters , 2005 .

[42]  Randy J. Pagulayan,et al.  User-centered design in games , 2012 .

[43]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[44]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[45]  Niels Ole Bernsen,et al.  First prototype of conversational H.C. Andersen , 2004, AVI.

[46]  Catherine Pelachaud,et al.  From brows to trust: evaluating embodied conversational agents , 2004 .