Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

In our current work we concentrate on finding correlation between speech signal and occurrence of facial gestures. Motivation behind this work is computer-generated human correspondent, ECA. In order to have a believable human representative it is important for an ECA to implement facial gestures in addition to verbal and emotional displays. Information needed for generation of facial gestures is extracted from speech prosody by analyzing natural speech in real-time. This work is based on the previously developed HUGE architecture for statistically-based facial gesturing and extends our previous work on automatic real-time lip sync.

[1]  Igor S. Pandzic,et al.  [HUGE]: Universal Architecture for Statistically Based HUman GEsturing , 2006, IVA.

[2]  Björn Granström,et al.  Timing and interaction of visual cues for prominence in audiovisual speech perception , 2001, INTERSPEECH.

[3]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[4]  Igor S. Pandzic,et al.  Automatic lip synchronization by speech signal analysis , 2008, INTERSPEECH.

[5]  David F. McAllister,et al.  Lip synchronization for animation , 1997, SIGGRAPH '97.

[6]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[7]  Igor S. Pandžić,et al.  Facial Gestures: Taxonomy and Application of Nonverbal, Nonemotional Facial Displays for Embodied Conversational Agents , 2007 .

[8]  A. Murat Tekalp,et al.  Prosody-Driven Head-Gesture Animation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Justine Cassell,et al.  Embodied Conversation: Integrating Face and Gesture into Automatic Spoken Dialogue Systems , 1998 .

[10]  Norman I. Badler,et al.  Eyes alive , 2002, ACM Trans. Graph..

[11]  M. Cranach,et al.  Human Ethology: Claims and Limits of a New Discipline. , 1982 .

[12]  Nadia Magnenat-Thalmann,et al.  Lip synchronization using linear predictive analysis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[13]  Igor S. Pandžić,et al.  Autonomous Speaker Agent , 2004 .

[14]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[15]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[16]  David F. McAllister,et al.  Lip synchronization of speech , 1997, AVSP.

[17]  Roxane Bertrand,et al.  About the relationship between eyebrow movements and Fo variations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Irene Albrecht,et al.  Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[19]  Christoph Bregler,et al.  Mood swings: expressive speech animation , 2005, TOGS.

[20]  Björn Granström,et al.  Audiovisual representation of prosody in expressive speech communication , 2004, Speech Commun..

[21]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[22]  Gregor Hofer,et al.  Automatic head motion prediction from speech data , 2007, INTERSPEECH.

[23]  Nicole Chovil Discourse‐oriented facial displays in conversation , 1991 .

[24]  Zhigang Deng,et al.  Audio-based head motion synthesis for Avatar-based telepresence systems , 2004, ETP '04.

[25]  Dirk Heylen,et al.  Combination of facial movements on a 3D talking head , 2004 .

[26]  Jeffery A. Jones,et al.  Visual Prosody and Speech Intelligibility , 2004, Psychological science.

[27]  Mario Malcangi,et al.  Audio Based Real-Time Speech Animation of Embodied Conversational Agents , 2003, Gesture Workshop.

[28]  John Lewis,et al.  Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[29]  P. Ekman,et al.  Facial Expressions of Emotion , 1979 .

[30]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[31]  Ricardo Gutierrez-Osuna,et al.  Speech-driven facial animation with realistic dynamics , 2005, IEEE Transactions on Multimedia.

[32]  Takaaki Kuratate,et al.  Audio-visual synthesis of talking faces from speech production correlates. , 1999 .

[33]  Björn Granström,et al.  Eyebrow movement as a cue to prominence , 1999 .

[34]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[35]  Tsuhan Chen,et al.  Real-time lip-synch face animation driven by human voice , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[36]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.