Prosody-based adaptive metaphoric head and arm gestures synthesis in human robot interaction

In human-human interaction, the process of communication can be established through three modalities: verbal, non-verbal (i.e., gestures), and/or para-verbal (i.e., prosody). The linguistic literature shows that the para-verbal and non-verbal cues are naturally aligned and synchronized, however the natural mechanism of this synchronization is still unexplored. The difficulty encountered during the coordination between prosody and metaphoric head-arm gestures concerns the conveyed meaning, the way of performing gestures with respect to prosodic characteristics, their relative temporal arrangement, and their coordinated organization in the phrasal structure of utterance. In this research, we focus on the mechanism of mapping between head-arm gestures and speech prosodic characteristics in order to generate an adaptive robot behavior to the interacting human's emotional state. Prosody patterns and the motion curves of head-arm gestures are aligned separately into parallel Hidden Markov Models (HMM). The mapping between speech and head-arm gestures is based on the Coupled Hidden Markov Models (CHMM), which could be seen as a multi-stream collection of HMM, characterizing the segmented prosody and head-arm gestures' data. An emotional state based audio-video database has been created for the validation of this study. The obtained results show the effectiveness of the proposed methodology.

[1]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[2]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3]  Marcelo Ang,et al.  Singularities of Euler and Roll-Pitch-Yaw Representations , 1987, IEEE Transactions on Aerospace and Electronic Systems.

[4]  Nanning Zheng,et al.  Unsupervised Analysis of Human Gestures , 2001, IEEE Pacific Rim Conference on Multimedia.

[5]  Jacob L. Mey,et al.  Pragmatics: An Introduction , 2001 .

[6]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[7]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[8]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, ACM Trans. Graph..

[9]  Adriana Tapus,et al.  A model for synthesizing a combined verbal and nonverbal behavior based on personality traits in human-robot interaction , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[11]  Zhigang Deng,et al.  Audio-based head motion synthesis for Avatar-based telepresence systems , 2004, ETP '04.

[12]  Yangsheng Xu,et al.  Online, interactive learning of gestures for human/robot interfaces , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[13]  Amir Aly,et al.  Towards an Interactive Human-Robot Relationship: Developing a Customized Robot Behavior to Human Profile. (Vers une relation Homme-Robot Interactive : développement d'un comportement du Robot adapté au Profil de l'Homme) , 2014 .

[14]  A. Murat Tekalp,et al.  Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Justine Cassell,et al.  Human conversation as a system framework: designing embodied conversational agents , 2001 .

[16]  Pengcheng Luo,et al.  Synchronized gesture and speech production for humanoid robots , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[18]  Catherine Pelachaud,et al.  Multimodal expressive embodied conversational agents , 2005, ACM Multimedia.

[19]  Norman I. Badler,et al.  To gesture or not to gesture: what is the question? , 2000, Proceedings Computer Graphics International 2000.

[20]  P. Wilson,et al.  The Nature of Emotions , 2012 .

[21]  Mark Steedman,et al.  APML, a Markup Language for Believable Behavior Generation , 2004, Life-like characters.

[22]  Adam Kendon,et al.  THE STUDY OF GESTURE: SOME REMARKS ON ITS HISTORY , 1983 .

[23]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[24]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[25]  Stefan Kopp,et al.  Multimodal Communication from Multimodal Thinking - towards an Integrated Model of Speech and Gesture Production , 2008, Int. J. Semantic Comput..

[26]  Michael Neff,et al.  Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis , 2007, IVA.

[27]  Sethuraman Panchanathan,et al.  Gesture segmentation in complex motion sequences , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[28]  William D. Penny,et al.  Gaussian Observation Hidden Markov models for EEG analysis , 1998 .

[29]  Steven M. Drucker,et al.  The effect of communication modality on cooperation in online environments , 2000, CHI.

[30]  Marc Schröder,et al.  Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[31]  Catherine Pelachaud,et al.  A Common Gesture and Speech Production Framework for Virtual and Physical Agents , 2012 .

[32]  P. Ekman What emotion categories or dimensions can observers judge from facial behavior , 1982 .

[33]  Adriana Tapus,et al.  Towards an online fuzzy modeling for human internal states detection , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[34]  Stephen J. Roberts,et al.  Coupled hidden Markov models for biosignal interaction modelling , 2000 .

[35]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[36]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[37]  Karl H.E. Kroemer,et al.  Ergonomics: How to Design for Ease and Efficiency , 1993 .

[38]  S. Levine,et al.  Gesture controllers , 2010, ACM Trans. Graph..

[39]  Tamim Asfour,et al.  Human-like motion of a humanoid robot arm based on a closed-form solution of the inverse kinematics problem , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[40]  A. Kendon Do Gestures Communicate? A Review , 1994 .

[41]  J. Lannoy,et al.  Gestures and Speech: Psychological Investigations , 1991 .

[42]  S. Roberts,et al.  Estimation of coupled hidden Markov models with application to biosignal interaction modelling , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[43]  Nigel Goddard,et al.  Incremental model-based discrimination of articulated movement from motion features , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[44]  P. Leva Adjustments to Zatsiorsky-Seluyanov's segment inertia parameters. , 1996 .

[45]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[46]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[47]  F. G. Evans,et al.  Anatomical Data for Analyzing Human Motion , 1983 .

[48]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..