Real-Time Cleaning and Refinement of Facial Animation Signals

With the increasing demand for real-time animated 3D content in the entertainment industry and beyond, performance-based animation has garnered interest among both academic and industrial communities. While recent solutions for motion-capture animation have achieved impressive results, handmade postprocessing is often needed, as the generated animations often contain artifacts. Existing real-time motion capture solutions have opted for standard signal processing methods to strengthen temporal coherence of the resulting animations and remove inaccuracies. While these methods produce smooth results, they inherently filter-out part of the dynamics of facial motion, such as high frequency transient movements. In this work, we propose a real-time animation refining system that preserves -or even restores- the natural dynamics of facial motions. To do so, we leverage an off-the-shelf recurrent neural network architecture that learns proper facial dynamics patterns on clean animation data. We parametrize our system using the temporal derivatives of the signal, enabling our network to process animations at any framerate. Qualitative results show that our system is able to retrieve natural motion signals from noisy or degraded input animation.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[3]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[4]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[5]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[6]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[7]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Zhiyong Wang,et al.  Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control , 2018, IEEE Transactions on Visualization and Computer Graphics.

[9]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[10]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[11]  Freda Kemp,et al.  An Introduction to Sequential Monte Carlo Methods , 2003 .

[12]  Eloïse Berson,et al.  A Robust Interactive Facial Animation Editing System , 2019, MIG.

[13]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[14]  Sebastian Nowozin,et al.  Efficient Nonlinear Markov Models for Human Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[16]  Siddhartha Chaudhuri,et al.  A Deep Recurrent Framework for Cleaning Motion Capture Data , 2017, ArXiv.

[17]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[19]  Peter A. Flach,et al.  Proceedings of the 28th International Conference on Machine Learning , 2011 .

[20]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[21]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[24]  Lance Williams,et al.  Motion signal processing , 1995, SIGGRAPH.

[25]  Thabo Beeler,et al.  Facial performance enhancement using dynamic shape space analysis , 2014, TOGS.

[26]  Yaser Sheikh,et al.  Bilinear spatiotemporal basis models , 2012, TOGS.

[27]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[28]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luc Van Gool,et al.  Smart particle filtering for high-dimensional tracking , 2007, Comput. Vis. Image Underst..

[30]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[31]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[32]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[33]  Ken-ichi Anjyo,et al.  Practice and Theory of Blendshape Facial Models , 2014, Eurographics.

[34]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Chi Fang,et al.  Pose robust face tracking by combining view-based AAMs and temporal filters , 2012, Comput. Vis. Image Underst..

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Elsevier Sdol,et al.  Journal of Visual Communication and Image Representation , 2009 .