Barycentric Representation and Metric Learning for Facial Expression Recognition

In this paper, we tackle the problem of dynamic facial expression recognition. An affine-invariant facial shape representation based on barycentric coordinates is proposed and related to the Grassmannian representation. Unlike the latter, the barycentric representation allows us to work directly on Euclidean space and apply a metric learning algorithm to find a suitable metric that is discriminative enough to compare facial shapes under different expressions. Finally, we exploit the learned metric in a machinery combining a Dynamic Time Warping (DTW) phase and a pairwise proximity function SVM classifier for a rate-invariant classification of the facial sequences. Experiments on the AFEW dataset show the effectiveness of our approach while exploiting only geometric features.

[1]  Hongdong Li,et al.  Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[3]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[4]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[5]  Mohammed Bennamoun,et al.  A spatio-temporal RBM-based model for facial expression recognition , 2016, Pattern Recognit..

[6]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[7]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stefano Berretti,et al.  Spontaneous Expression Detection from 3D Dynamic Sequences by Analyzing Trajectories on Grassmann Manifolds , 2018, IEEE Transactions on Affective Computing.

[10]  Mohamed Daoudi,et al.  A Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Klaus Obermayer,et al.  Classification on Pairwise Proximity Data , 1998, NIPS.

[12]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[14]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[15]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[16]  Tamás D. Gedeon,et al.  Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary , 2013, ICMI '13.

[17]  Sergio Escalera,et al.  Support vector machines with time series distance kernels for action classification , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Rama Chellappa,et al.  Towards view-invariant expression analysis using analytic shape manifolds , 2011, Face and Gesture 2011.

[20]  Thomas Philip Runarsson,et al.  Support vector machines and dynamic time warping for time series , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Rushil Anirudh,et al.  Elastic Functional Coding of Riemannian Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xilin Chen,et al.  Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Maja Pantic,et al.  AFEW-VA database for valence and arousal estimation in-the-wild , 2017, Image Vis. Comput..

[26]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[29]  Luc Van Gool,et al.  Building Deep Networks on Grassmann Manifolds , 2016, AAAI.

[30]  Rama Chellappa,et al.  Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  乔宇 Motionlets: Mid-Level 3D Parts for Human Motion Recognition , 2013 .

[32]  Maureen T. Carroll Geometry , 2017 .