Generating Robot/Agent backchannels during a storytelling experiment

This work presents the development of a real-time framework for the research of Multimodal Feedback of Robots/Talking Agents in the context of Human Robot Interaction (HRI) and Human Computer Interaction (HCI). For evaluating the framework, a Multimodal corpus is built (ENTERFACE_STEAD), and a study on the important multimodal features was done for building an active Robot/Agent listener of a storytelling experience with Humans. The experiments show that even when building the same reactive behavior models for Robot and Talking Agents, the interpretation and the realization of the behavior communicated is different due to the different communicative channels Robots/Agents offer be it physical but less human-like in Robots, and virtual but more expressive and human-like in Talking agents.

[1]  Katharina J. Rohlfing,et al.  “Try something else!” — When users change their discursive behavior in human-robot interaction , 2008, 2008 IEEE International Conference on Robotics and Automation.

[2]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[3]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[4]  Shrikanth S. Narayanan,et al.  An Acoustic Measure for Word Prominence in Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Catherine Pelachaud,et al.  From brows to trust: evaluating embodied conversational agents , 2004 .

[6]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[9]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[10]  Louis ten Bosch,et al.  Acoustical features as predictors for prominence in read aloud dutch sentences used in ANN's , 1999, EUROSPEECH.

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Anne Lacheret,et al.  French prominence: A probabilistic framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Elisabetta Bevacqua,et al.  A Listening Agent Exhibiting Variable Behaviour , 2008, IVA.

[14]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[15]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[16]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[17]  Joakim Nivre,et al.  On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[18]  Shrikanth Narayanan,et al.  Detecting prominence in conversational speech: pitch accent, givenness and focus , 2008, Speech Prosody 2008.

[19]  J. Terken,et al.  Fundamental frequency and perceived prominence of accented syllables. II. Nonfinal accents. , 1994, The Journal of the Acoustical Society of America.

[20]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[21]  Candace L. Sidner,et al.  Explorations in engagement for humans and robots , 2005, Artif. Intell..

[22]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[23]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[24]  Jean-Claude Martin,et al.  The effects of speech-gesture cooperation in animated agents' behavior in multimedia presentations , 2007, Interact. Comput..

[25]  Jean-Christophe Baillie,et al.  URBI: towards a universal robotic low-level programming language , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Cynthia Breazeal,et al.  Social interactions in HRI: the robot view , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Kristinn R. Thórisson,et al.  The Power of a Nod and a Glance: Envelope Vs. Emotional Feedback in Animated Conversational Agents , 1999, Appl. Artif. Intell..

[28]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[29]  Stacy Marsella,et al.  Natural Behavior of a Listening Agent , 2005, IVA.

[31]  Susanne van Mulken,et al.  The impact of animated interface agents: a review of empirical research , 2000, Int. J. Hum. Comput. Stud..

[32]  Dan R. Olsen,et al.  Metrics for Evaluating Human-Robot Interactions , 2003 .

[33]  Stefan Kopp,et al.  The Behavior Markup Language: Recent Developments and Challenges , 2007, IVA.

[34]  J. Cassell,et al.  Embodied conversational agents , 2000 .