Facial expression intensity estimation using Siamese and triplet networks

Abstract This paper investigates the Siamese and triplet networks abilities of emotional intensity estimation in facial image sequence. In our method, we extract the sequential relationship in the temporal domain that appears due to the natural onset apex offset variation in pattern of facial expression. Siamese and triplet networks are shown to perform better than the earlier convolutional neural networks in such task. The branches of the Siamese and triplet networks help in leading to an output that is more definite. Compared with Siamese network, the triplet network internal representation of learned features emerges clearer and more accurate localizations of those features appear with training. This property improves the network generalization when dealing with similar sequential images. We confirmed this by experiments on Cohn–Kanade, MUG and MMI datasets for intensity estimations and CASME, CASME II and CAS(ME) 2 datasets on micro-expressions detection.

[1]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[2]  Qingshan Liu,et al.  RankBoost with l1 regularization for facial expression recognition and intensity estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Yuichi Ohta,et al.  Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor , 2009, ICDP.

[4]  J. Cohn,et al.  Deciphering the Enigmatic Face , 2005, Psychological science.

[5]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[6]  Qi Wu,et al.  The Machine Knows What You Are Hiding: An Automatic Micro-expression Recognition System , 2011, ACII.

[7]  Gwen Littlewort,et al.  Dynamics of Facial Expression Extracted Automatically from Video , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[9]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[10]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Maja Pantic,et al.  Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[13]  Stefan Duffner,et al.  Siamese multi-layer perceptrons for dimensionality reduction and face identification , 2015, Multimedia Tools and Applications.

[14]  T. Sejnowski,et al.  Measuring facial expressions by computer image analysis. , 1999, Psychophysiology.

[15]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Anastasios Delopoulos,et al.  The MUG facial expression database , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[17]  Margot J. Taylor,et al.  Early processing of the six basic facial emotional expressions. , 2003, Brain research. Cognitive brain research.

[18]  Stan Z. Li,et al.  Deep Metric Learning for Practical Person Re-Identification , 2014, ArXiv.

[19]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Fumio Hara,et al.  Recognition of Six basic facial expression and their strength by neural network , 1992, [1992] Proceedings IEEE International Workshop on Robot and Human Communication.

[21]  Takio Kurita,et al.  Improvement of Feature Localization for Facial Expressions by Adding Noise , 2017 .

[22]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[23]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[24]  Qi Wu,et al.  CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Rita Cucchiara,et al.  A Deep Siamese Network for Scene Detection in Broadcast Videos , 2015, ACM Multimedia.

[27]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Guoying Zhao,et al.  CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation , 2014, PloS one.

[29]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[31]  J. Reilly,et al.  Non-Linear Approaches for the Classification of Facial Expressions at Varying Degrees of Intensity , 2007, International Machine Vision and Image Processing Conference (IMVIP 2007).

[32]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[33]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Marco Wiering,et al.  A Model Based Method for Automatic Facial Expression Recognition , 2005, ECML.

[35]  Qiang Ji,et al.  A unified probabilistic framework for measuring the intensity of spontaneous facial action units , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[36]  Yangsheng Xu,et al.  Real-time estimation of facial expression intensity , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[37]  Pierre Baldi,et al.  Neural Networks for Fingerprint Recognition , 1993, Neural Computation.

[38]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.