Control of speech-related facial movements of an avatar from 1 video 2 3

30 Several puppetry techniques have been recently prop osed to transfer emotional facial expressions to 31 an avatar from a user’s video. Whereas generation o f facial expressions may not be sensitive to small 32 tracking errors, generation of speech-related facia l movements would be severely impaired. Since 33 incongruent facial movements can drastically influe nce speech perception, we proposed a more 34 effective method to transfer speech-related facial movements from a user to an avatar. After a facial 35 tracking phase, speech articulatory parameters (con trolli g the jaw and the lips) were determined from 36 the set of landmark positions. Two additional proce sses calculated the articulatory parameters which 37 controlled the eyelids and the tongue from the 2D D iscrete Cosine Transform coefficients of the eyes 38 and inner mouth images. 39 A speech in noise perception experiment was conduct e on 25 participants to evaluate the system. 40 Increase in intelligibility was shown for the avata r and human auditory-visual conditions compared to 41 the avatar and human auditory-only conditions, resp ectively. Depending on the vocalic context, the 42 results of the avatar auditory-visual presentation were different: all the consonants were better 43 perceived in /a/ vocalic context compared to /i/ an d /u/ because of the lack of depth information 44 retrieved from video. This method could be used to accurately animate avatars for hearing impaired 45 people using information technologies and telecommu nication. 46 47

[1]  Barry-John Theobald,et al.  Real-time expression cloning using appearance models , 2007, ICMI '07.

[2]  G. Plant Perceiving Talking Faces: From Speech Perception to a Behavioral Principle , 1999 .

[3]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[4]  J. A. Johnson,et al.  Point-light facial displays enhance comprehension of speech in noise. , 1996, Journal of speech and hearing research.

[5]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Simon Lucey,et al.  Real-time avatar animation from a single image , 2011, Face and Gesture 2011.

[7]  D. Massaro,et al.  Integration of facial and newly learned visual cues in speech perception. , 2011, The American journal of psychology.

[8]  Jeffrey R. Spies,et al.  Mapping and Manipulating Facial Expression , 2009, Language and speech.

[9]  Kenneth I Forster,et al.  DMDX: A Windows display program with millisecond accuracy , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[10]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[11]  A. Fort,et al.  Bimodal speech: early suppressive visual effects in human auditory cortex , 2004, The European journal of neuroscience.

[12]  E. Vatikiotis-Bateson,et al.  Kinematics-Based Synthesis of Realistic Talking Faces , 1998, AVSP.

[13]  Pierre Badin,et al.  Three-dimensional linear modeling of tongue: Articulatory data and models , 2006 .

[14]  M. Fratarcangeli,et al.  A Non-Invasive Approach for Driving Virtual Talking Heads from Real Facial Movements , 2007, 2007 3DTV Conference.

[15]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[16]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, SIGGRAPH 2005.

[17]  Jeffrey R. Spies,et al.  Effects of damping head movement and facial expression in dyadic conversation using real–time facial expression tracking and synthesized avatars , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[19]  Soraia Raupp Musse,et al.  Reflecting User Faces in Avatars , 2010, IVA.

[20]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[21]  Slim Ouni,et al.  Visual Contribution to Speech Perception: Measuring the Intelligibility of Animated Talking Heads , 2007, EURASIP J. Audio Speech Music. Process..

[22]  Guillaume Gibert,et al.  Analysis and synthesis of the three-dimensional movements of the head, face, and hand of a speaker using cued speech. , 2005, The Journal of the Acoustical Society of America.

[23]  Slim Ouni,et al.  Internationalization of a Talking Head , 2003 .

[24]  Kostas Karpouzis,et al.  Virtual agent multimodal mimicry of humans , 2007, Lang. Resour. Evaluation.

[25]  Gérard Bailly,et al.  MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation , 2000, INTERSPEECH.

[26]  Fred Nicolls,et al.  Locating Facial Features with an Extended Active Shape Model , 2008, ECCV.

[27]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[28]  Christian Benoît,et al.  Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP , 1998, Speech Commun..

[29]  Shigeo Morishima,et al.  Face analysis and synthesis , 2001, IEEE Signal Process. Mag..

[30]  Seong-Won Lee,et al.  Hierarchical active shape model with motion prediction for real-time tracking of non-rigid objects , 2007 .

[31]  Christophe Garcia,et al.  Avatar Puppetry Using Real-Time Audio and Video Analysis , 2007, IVA.