Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-nonarousal and dominance-submissiveness. The PAD dimensions are used to capture the high-level emotional state of talking avatar with specific facial expression. A set of partial expression parameter (PEP) is designed to depict the expressive facial motion patterns in local face areas, and reduce the complexity of directly manipulation of low-level MPEG4 facial animation parameters (FAP). The relationship among the emotion (PAD), expression (PEP) and animation (FAP) parameter is analyzed on a virtual facial expression database. Two levels of parameter mapping are implemented, namely the emotion-expression mapping from PAD to PEP, and the linear interpolation from PEP to FAP. The synthetic emotional facial expression is combined with the talking avatar speech animation in a text to audio visual speech system. Perceptual evaluation shows that our approach can generate appropriate facial expressions for subtle and complex emotions defined by PAD and thus enhance the emotional expressivity of talking avatar.

[1]  Beat Fasel,et al.  Recognition of asymmetric facial action unit activities and intensities , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Lianhong Cai,et al.  Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis , 2006, INTERSPEECH.

[3]  Björn Granström,et al.  Audiovisual representation of prosody in expressive speech communication , 2004, Speech Commun..

[4]  Lianhong Cai,et al.  Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  J. Cohn,et al.  Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. , 1999, Psychophysiology.

[6]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Nadia Mana,et al.  HMM-based synthesis of emotional facial expressions during speech in synthetic talking heads , 2006, ICMI '06.

[8]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[9]  Thomas S. Huang,et al.  Real-time speech-driven face animation with expressions using neural networks , 2002, IEEE Trans. Neural Networks.

[10]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[11]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[13]  Roddy Cowie,et al.  Emotion Recognition and Synthesis Based on MPEG‐4 FAPs , 2002 .

[14]  C. W. Hughes Emotion: Theory, Research and Experience , 1982 .

[15]  J. M. Kittross The measurement of meaning , 1959 .

[16]  M. Cranach,et al.  Human ethology : claims and limits of a new discipline : contributions to the Colloquium , 1982 .

[17]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[18]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[19]  Laila Dybkjær,et al.  Affective Dialogue Systems , 2004, Lecture Notes in Computer Science.

[20]  Frédéric H. Pighin,et al.  Expressive speech-driven facial animation , 2005, TOGS.

[21]  P. Ekman,et al.  EMFACS-7: Emotional Facial Action Coding System , 1983 .

[22]  Fabio Lavagetto,et al.  An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame , 2001, IEEE Trans. Circuits Syst. Video Technol..

[23]  Xiaolan Fu,et al.  The Reliability and Validity of the Chinese Version of Abbreviated PAD Emotion Scales , 2005, ACII.

[24]  L. Vistnes The Artist??s Complete Guide to Facial Expression , 1992 .

[25]  Jesús Ibáñez,et al.  Storytelling in virtual environments from a virtual guide perspective , 2003, Virtual Reality.

[26]  Nadia Magnenat-Thalmann,et al.  MULTIMODAL ANIMATION SYSTEM BASED ON THE MPEG-4 STANDARD , 1999 .

[27]  Juan Pineda,et al.  A parallel algorithm for polygon rasterization , 1988, SIGGRAPH.

[28]  Michael J. Lyons,et al.  Coding facial expressions with Gabor wavelets , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[29]  Kostas Karpouzis,et al.  Parameterized Facial Expression Synthesis Based on MPEG-4 , 2002, EURASIP J. Adv. Signal Process..

[30]  P. Lang Behavioral treatment and bio-behavioral assessment: computer applications , 1980 .

[31]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[32]  Irene Albrecht,et al.  Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[33]  Wang Zhiming,et al.  A dynamic viseme model for personalizing a talking head , 2002, 6th International Conference on Signal Processing, 2002..

[34]  Paul J. W. ten Hagen,et al.  Emotion Disc and Emotion Squares: Tools to Explore the Facial Expression Space , 2003, Comput. Graph. Forum.

[35]  Dirk Heylen,et al.  Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Marc Schröder,et al.  Expressing degree of activation in synthetic speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[38]  Hans-Peter Seidel,et al.  Mixed feelings: expression of non-basic emotions in a muscle-based talking head , 2005, Virtual Reality.

[39]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Thomas S. Huang,et al.  MPEG4 performance-driven avatar via robust facial motion tracking , 2008, 2008 15th IEEE International Conference on Image Processing.

[41]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[42]  Marc Schröder,et al.  Dimensional Emotion Representation as a Basis for Speech Synthesis with Non-extreme Emotions , 2004, ADS.

[43]  C. Darwin The Expression of the Emotions in Man and Animals , .

[44]  T. A. Williams,et al.  Technology in mental health care delivery systems , 1980 .

[45]  Jie Cao,et al.  PAD Model Based Facial Expression Analysis , 2008, ISVC.

[46]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[47]  Anton Nijholt,et al.  Presenting in Virtual Worlds: Towards an Architecture for a 3D Presenter Explaining 2D-Presented Information , 2005, INTETAIN.

[48]  Lianhong Cai,et al.  Affect Related Acoustic Features of Speech and Their Modification , 2007, ACII.

[49]  Daniel Thalmann,et al.  Simulation of Facial Muscle Actions Based on Rational Free Form Deformations , 1992, Comput. Graph. Forum.

[50]  Lianhong Cai,et al.  Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar , 2006, INTERSPEECH.

[51]  R. Plutchik The measurement of emotions , 1997, Acta Neuropsychiatrica.

[52]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[53]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[54]  J. Russell,et al.  The psychology of facial expression: Frontmatter , 1997 .

[55]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[56]  Maja J. Mataric,et al.  Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information , 2009, IEEE Transactions on Multimedia.

[57]  Lianhong Cai,et al.  MODELLING THE GLOBAL ACOUSTIC CORRELATES OF EXPRESSIVITY FOR CHINESE TEXT-TO-SPEECH SYNTHESIS , 2006, 2006 IEEE Spoken Language Technology Workshop.

[58]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[59]  Xueyin Lin,et al.  Emotional facial expression model building , 2003, Pattern Recognition Letters.

[60]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[61]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[62]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .