Predictive Sampling of Facial Expression Dynamics Driven by a Latent Action Space

We present a probabilistic generative model for tracking by prediction the dynamics of affective spacial expressions in videos. The model relies on Bayesian filter sampling of facial landmarks conditioned on motor action parameter dynamics; namely, trajectories shaped by an autoregressive Gaussian Process Latent Variable state-space. The analysis-by-synthesis approach at the heart of the model allows for both inference and generation of affective expressions. Robustness of the method to occlusions and degradation of video quality has been assessed on a publicly available dataset.

[1]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[2]  Giuliano Grossi,et al.  Deep Construction of an Affective Latent Space via Multimodal Enactment , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[3]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[4]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[5]  M. Iacoboni Neurobiology of imitation , 2009, Current Opinion in Neurobiology.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Anthony J. Yezzi,et al.  A Compact Formula for the Derivative of a 3-D Rotation in Exponential Coordinates , 2013, Journal of Mathematical Imaging and Vision.

[8]  H. Helmholtz Über Integrale der hydrodynamischen Gleichungen, welche den Wirbelbewegungen entsprechen. , 1858 .

[9]  Fadi Dornaika,et al.  Fitting 3D face models for tracking and active appearance model training , 2006, Image Vis. Comput..

[10]  Simon Baron-Cohen,et al.  Empathizing with basic emotions: Common and discrete neural substrates , 2006, Social neuroscience.

[11]  Juan M. Corchado,et al.  A Survey of Recent Advances in Particle Filters and Remaining Challenges for Multitarget Tracking , 2017, Sensors.

[12]  Bernd Radig,et al.  A real time system for model-based interpretation of the dynamics of facial expressions , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[13]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[14]  Pier Francesco Ferrari,et al.  Faces in the mirror, from the neuroscience of mimicry to the emergence of mentalizing. , 2016, Journal of anthropological sciences = Rivista di antropologia : JASS.

[15]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ioannis Pitas,et al.  Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[17]  Fadi Dornaika,et al.  Simultaneous Facial Action Tracking and Expression Recognition in the Presence of Head Motion , 2008, International Journal of Computer Vision.

[18]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[21]  Fadi Dornaika,et al.  Efficient generic face model fitting to images and videos , 2014, Image Vis. Comput..

[22]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[23]  Maurice Milgram,et al.  Head Pose Determination Using Synthetic Images , 2008, ACIVS.

[24]  R. Adolphs Recognizing emotion from facial expressions: psychological and neurological mechanisms. , 2002, Behavioral and cognitive neuroscience reviews.

[25]  Mauricio A. Álvarez,et al.  Gaussian Process Dynamical Models for Emotion Recognition , 2014, ISVC.

[26]  Paola Campadelli,et al.  Boosted Tracking in Video , 2010, IEEE Signal Processing Letters.

[27]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[28]  Giuseppe Boccignone,et al.  Affective Facial Expression Processing via Simulation: A Probabilistic Model , 2014, Biologically Inspired Cognitive Architectures.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Giuliano Grossi,et al.  A Note on Modelling a Somatic Motor Space for Affective Facial Expressions , 2017, ICIAP Workshops.

[31]  José Santos-Victor,et al.  Visual learning by imitation with motor representations , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Jakub M. Tomczak,et al.  Articulated tracking with manifold regularized particle filter , 2016, Machine Vision and Applications.

[33]  Binbin Zhang,et al.  The Improvement of Parameterized Face Model of Candide Based on MPEG-4 and FACS , 2014 .

[34]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[35]  Raffaella Lanzarotti,et al.  Using sparse coding for landmark localization in facial expressions , 2014, 2014 5th European Workshop on Visual Information Processing (EUVIP).

[36]  Jörgen Ahlberg,et al.  An Active Model for Facial Feature Tracking , 2002, EURASIP J. Adv. Signal Process..

[37]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  P. Niedenthal,et al.  Fashioning the Face: Sensorimotor Simulation Contributes to Facial Expression Recognition , 2016, Trends in Cognitive Sciences.

[39]  Maja Pantic,et al.  Hierarchical On-line Appearance-Based Tracking for 3D head pose, eyebrows, lips, eyelids and irises , 2013, Image Vis. Comput..

[40]  Richard H. Jones,et al.  Maximum Likelihood Fitting of ARMA Models to Time Series With Missing Observations , 1980 .

[41]  M. Taner Eskil,et al.  Facial expression recognition based on anatomy , 2014, Comput. Vis. Image Underst..