Using Group History to Identify Character-Directed Utterances in Multi-Child Interactions

Addressee identification is an element of all language-based interactions, and is critical for turn-taking. We examine the particular problem of identifying when each child playing an interactive game in a small group is speaking to an animated character. After analyzing child and adult behavior, we explore a family of machine learning models to integrate audio and visual features with temporal group interactions and limited, task-independent language. The best model performs identification about 20% better than the model that uses the audio-visual features of the child alone.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Donald W. Fiske,et al.  Face-to-face interaction: Research, methods, and theory , 1977 .

[3]  Sharon L. Oviatt,et al.  Human perception of intended addressee during computer-assisted meetings , 2006, ICMI '06.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Shrikanth S. Narayanan,et al.  Comparison of child-human and child-computer interactions based on manual annotations , 2009, WOCCI '09.

[6]  Mary P. Harper,et al.  Multimodal floor control shift detection , 2009, ICMI-MLMI '09.

[7]  Ian R. Fasel,et al.  Multi-modal features for real-time detection of human-robot interaction categories , 2009, ICMI-MLMI '09.

[8]  Tanja Schultz,et al.  Identifying the addressee in human-human-robot interactions based on head pose and speech , 2004, ICMI '04.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Sharon L. Oviatt Talking to thimble jellies: children²s conversational speech with animated characters , 2000, INTERSPEECH.

[11]  Eric Horvitz,et al.  Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions , 2011, SIGDIAL Conference.

[12]  Anton Nijholt,et al.  Addressee Identification in Face-to-Face Meetings , 2006, EACL.

[13]  Yasuharu Den,et al.  Simultaneous prediction of dialog acts and address types in three-party conversations , 2007, ICMI '07.

[14]  Stefan Kopp,et al.  Regulating Dialogue with Gestures - Towards an Empirically Grounded Simulation with Conversational Agents , 2011, SIGDIAL Conference.

[15]  Jacques M. B. Terken,et al.  Facial Orientation During Multi-party Interaction with Information Kiosks , 2003, INTERACT.

[16]  HadzikadicMirsad,et al.  Learning to Predict , 1997 .

[17]  P.J. Denning,et al.  On learning how to predict , 1980, Proceedings of the IEEE.

[18]  Eric Horvitz,et al.  Dialog in the open world: platform and applications , 2009, ICMI-MLMI '09.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Julia Hirschberg,et al.  Turn-Yielding Cues in Task-Oriented Dialogue , 2009, SIGDIAL Conference.

[21]  Caroline Clemens,et al.  Prosodic Turn-Yielding Cues With and Without Optical Feedback , 2009, SIGDIAL Conference.

[22]  Jacques M. B. Terken,et al.  Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent , 2007, ICMI '07.

[23]  Eric Horvitz,et al.  Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings , 2009, SIGDIAL Conference.