Combining active learning and local patch alignment for data-driven facial animation with fine-grained local detail

Abstract Data-driven facial animation has attracted considerable attention from both academic and industrial communities in recent years. Typically, the motion data used to animate the faces are derived from either 3D or 2D facial features whose positions on the face are determined according to experience. There still lacks an automatic approach to determine the optimal positions of the features to face deformation, and current face deformation methods are incapable of providing fine-grained local geometric characteristics. This paper proposes a novel scheme for face animation in which an active learning method based on Locally Linear Reconstruction algorithm is exploited to determine the optimal feature positions on the face for face deformation, and the Semi-Supervised Local Patch Alignment algorithm is subsequently used to deform the face with the selected features according to the optimal feature positions. The active learning model can be solved by a sequential approach, and the Semi-Supervised Local Patch Alignment model can be addressed by a least-square method. Experimental results on various types of faces demonstrate the superiority of the proposed scheme to existing approaches in both feature points selection and fine-grained local characteristics preservation.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Hai Xuan Pham,et al.  Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[4]  Jianping Fan,et al.  Leveraging Content Sensitiveness and User Trustworthiness to Recommend Fine-Grained Privacy Settings for Social Image Sharing , 2018, IEEE Transactions on Information Forensics and Security.

[5]  Weifeng Liu,et al.  Multiview dimension reduction via Hessian multiset canonical correlations , 2018, Inf. Fusion.

[6]  Jiawei Zhang,et al.  Learning to Hallucinate Face Images via Component Generation and Enhancement , 2017, IJCAI.

[7]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[8]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[9]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[10]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[11]  Derek Bradley,et al.  High resolution passive facial performance capture , 2010, SIGGRAPH 2010.

[12]  Chun Chen,et al.  Convex experimental design using manifold structure for image retrieval , 2009, MM '09.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Bingbing Ni,et al.  Annotation modification for fine-grained visual recognition , 2018, Neurocomputing.

[16]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[17]  Xueqi Ma,et al.  $p$ -Laplacian Regularization for Scene Recognition , 2019, IEEE Transactions on Cybernetics.

[18]  Tao Chen,et al.  Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model , 2018, Neurocomputing.

[19]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Optimum Experimental Designs, With SAS , 2008 .

[21]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[22]  Jaakko Lehtinen,et al.  Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..

[23]  Na Li,et al.  A semi-supervised framework for topology preserving performance-driven facial animation , 2018, Signal Process..

[24]  Xueqi Ma,et al.  Hypergraph $p$ -Laplacian Regularization for Remotely Sensed Image Recognition , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Yi Yang,et al.  Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization , 2015, International Journal of Computer Vision.

[26]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[28]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[29]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[30]  Marco Fratarcangeli,et al.  Facial motion cloning with radial basis functions in MPEG-4 FBA , 2007, Graph. Model..

[31]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[32]  Jian Wan,et al.  Location-Aware Service Recommendation With Enhanced Probabilistic Matrix Factorization , 2018, IEEE Access.

[33]  Zhou Yu,et al.  User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[34]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Meng Wang,et al.  Adaptive Hypergraph Learning and its Application in Image Classification , 2012, IEEE Transactions on Image Processing.

[36]  Min Chen,et al.  Volume Deformation via Scattered Data Interpolation , 2007, VG@Eurographics.

[37]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[38]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[39]  Christian Rössl,et al.  Laplacian surface editing , 2004, SGP '04.

[40]  Jane You,et al.  Data-driven facial animation via semi-supervised local patch alignment , 2016, Pattern Recognit..

[41]  Jinxiang Chai,et al.  Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, SIGGRAPH 2011.

[42]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[43]  Zhuang Yue-ting,et al.  Data-driven facial animation based on manifold Bayesian regression , 2006 .

[44]  Jian Zhang,et al.  Monocular face reconstruction with global and local shape constraints , 2015, Neurocomputing.

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  Keith Waters,et al.  Computer Facial Animation, Second Edition , 1996 .

[47]  G. Bersani,et al.  Facial Action Coding System (FACS): uno strumento di valutazione obiettiva dell’espressività mimica facciale e le sue potenziali applicazioni allo studio della schizofrenia , 2012 .

[48]  Li Zhang,et al.  Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.

[49]  Qilong Wang,et al.  Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval , 2017, Neurocomputing.

[50]  Zhigang Deng,et al.  Animating blendshape faces by cross-mapping motion capture data , 2006, I3D '06.

[51]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.