Emotional facial expression transfer based on temporal restricted Boltzmann machines

Emotional facial expression transfer involves sequence-to-sequence mappings from an neutral facial expression to another emotional facial expression, which is a well-known problem in computer graphics. In the graphics community, current considered methods are typically linear (e.g., methods based on blendshape mapping) and the dynamical aspects of the facial motion itself are not taken into account. This makes it difficult to retarget the facial articulations involved in speech. In this paper, we apply a temporal restricted Boltzmann machines based model to emotional facial expression transfer. The method can encode a complex nonlinear mapping from the motion of one neutral facial expression to another emotional facial expression which captures facial geometry and dynamics of both neutral state and emotional state.

[1]  Daniel Thalmann,et al.  Simulation of Facial Muscle Actions Based on Rational Free Form Deformations , 1992, Comput. Graph. Forum.

[2]  Hans-Peter Seidel,et al.  Geometry-based Muscle Modeling for Facial Animation , 2001, Graphics Interface.

[3]  John P. Lewis,et al.  Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces , 2006, IEEE Transactions on Visualization and Computer Graphics.

[4]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[5]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[6]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[7]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[8]  Bobby Bodenheimer,et al.  The Process of Motion Capture: Dealing with the Data , 1997, Computer Animation and Simulation.

[9]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[10]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Zhigang Deng,et al.  Expressive Speech Animation Synthesis with Phoneme‐Level Controls , 2008, Comput. Graph. Forum.

[13]  Flora S. Tsai,et al.  Computer Animation of Facial Emotions , 2010, 2010 International Conference on Cyberworlds.

[14]  Laila Benhlima,et al.  An e-government knowledge model: 'e-customs' case study , 2014, Electron. Gov. an Int. J..

[15]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[16]  Ee Ping Ong,et al.  Real-time 3D talking head from a synthetic viseme dataset , 2009, VRCAI '09.

[17]  Ming Xu,et al.  Research on facial expression animation based on 2D mesh morphing driven by pseudo muscle model , 2010, 2010 International Conference on Educational and Information Technology.

[18]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[19]  Gerasimos Potamianos,et al.  Audio-visual unit selection for the synthesis of photo-realistic talking-heads , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[20]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[21]  Christoph Bregler,et al.  Facial expression space learning , 2002, 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings..

[22]  John P. Lewis,et al.  Facial motion retargeting , 2006, SIGGRAPH Courses.

[23]  Piero Cosi,et al.  Labial coarticulation modeling for realistic facial animation , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Baining Guo,et al.  Real-time texture synthesis by patch-based sampling , 2001, TOGS.

[26]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[27]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[28]  John P. Lewis,et al.  Automated eye motion using texture synthesis , 2005, IEEE Computer Graphics and Applications.

[29]  Eddie Kohler,et al.  Real-time speech motion synthesis from recorded motions , 2004, SCA '04.

[30]  Leonid Sigal,et al.  Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines , 2011, NIPS.

[31]  Ronald A. Cole,et al.  Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data: Research Articles , 2004 .

[32]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[33]  Frédéric H. Pighin,et al.  Expressive speech-driven facial animation , 2005, TOGS.

[34]  Christoph Bregler,et al.  Mood swings: expressive speech animation , 2005, TOGS.

[35]  Nadia Magnenat-Thalmann,et al.  Visyllable Based Speech Animation , 2003, Comput. Graph. Forum.

[36]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[37]  Luc Van Gool,et al.  Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[38]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[39]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[40]  Scott A. King,et al.  Creating speech-synchronized animation , 2005, IEEE Transactions on Visualization and Computer Graphics.

[41]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[42]  Susanto Rahardja,et al.  Automatic and real-time 3D face synthesis , 2009, VRCAI '09.

[43]  Ronald A. Cole,et al.  Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data , 2004, Comput. Animat. Virtual Worlds.