THIN: THrowable Information Networks and Application for Facial Expression Recognition In The Wild

For a number of tasks solved using deep learning techniques, an exogenous variable can be identified such that (a) it heavily influences the appearance of the different classes, and (b) an ideal classifier should be invariant to this variable. An example of such exogenous variable is identity if facial expression recognition (FER) is considered. In this paper, we propose a dual exogenous/endogenous representation. The former captures the exogenous variable whereas the second one models the task at hand (e.g. facial expression). We design a prediction layer that uses a deep ensemble conditioned by the exogenous representation. It employs a differential tree gate that learns an adaptive weak predictor weighting, therefore modeling a partition of the exogenous representation space, upon which the weak predictors specialize. This layer explicitly models the dependency between the exogenous variable and the predicted task (a). We also propose an exogenous dispelling loss to remove the exogenous information from the endogenous representation, enforcing (b). Thus, the exogenous information is used two times in a throwable fashion, first as a conditioning variable for the target task, and second to create invariance within the endogenous representation. We call this method THIN, standing for THrowable Information Networks. We experimentally validate THIN in several contexts where an exogenous information can be identified, such as digit recognition under large rotations and shape recognition at multiple scales. We also apply it to FER with identity as the exogenous variable. In particular, we demonstrate that THIN significantly outperforms state-of-the-art approaches on several challenging datasets.

[1]  Yao Lu,et al.  Separate Loss for Basic and Compound Facial Expression Recognition in the Wild , 2019, ACML.

[2]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Behzad Hassani,et al.  Bounded Residual Gradient Networks (BReG-Net) for Facial Affect Computing , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[4]  Jane You,et al.  Hard negative generation for identity-disentangled facial expression recognition , 2019, Pattern Recognit..

[5]  Arnaud Dapogny,et al.  DeeSCo: Deep heterogeneous ensemble with Stochastic Combinatory loss for gaze estimation , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[6]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[7]  Kevin Bailly,et al.  Tree-Gated Deep Mixture-of-Experts for Pose-Robust Face Alignment , 2020, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[8]  Nilanjan Sarkar,et al.  Understanding How Adolescents with Autism Respond to Facial Expressions in Virtual Reality Environments , 2013, IEEE Transactions on Visualization and Computer Graphics.

[9]  Maria E. Jabon,et al.  Facial expression analysis for predicting unsafe driving behavior , 2011, IEEE Pervasive Computing.

[10]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Jane You,et al.  Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Zhijie Pan,et al.  Semantic Neighborhood-Aware Deep Facial Expression Recognition , 2020, IEEE Transactions on Image Processing.

[13]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[14]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[15]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[16]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[17]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[18]  Ashish Kapoor,et al.  Automatic prediction of frustration , 2007, Int. J. Hum. Comput. Stud..

[19]  Lijun Yin,et al.  Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition , 2018, BMVC.

[20]  Liming Chen,et al.  JEMImE: A Serious Game to Teach Children with ASD How to Adequately Produce Facial Expressions , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[21]  Guang Liang,et al.  Identity- and Pose-Robust Facial Expression Recognition through Adversarial Feature Learning , 2019, ACM Multimedia.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Chien-Hsu Chen,et al.  Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. , 2015, Research in developmental disabilities.

[24]  Shiguang Shan,et al.  Facial Expression Recognition with Inconsistently Annotated Datasets , 2018, ECCV.

[25]  Jie Cai Improving Person-Independent Facial Expression Recognition Using Deep Learning , 2019 .

[26]  Kevin Bailly,et al.  Tree-gated Deep Regressor Ensemble For Face Alignment In The Wild , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[27]  Shiguang Shan,et al.  Patch-Gated CNN for Occlusion-aware Facial Expression Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[30]  Manfred Tscheligi,et al.  Facial expressions as game input with different emotional feedback conditions , 2008, ACE '08.

[31]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[33]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[34]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[37]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Danyang Li,et al.  Ensemble of Deep Neural Networks with Probability-Based Fusion for Facial Expression Recognition , 2017, Cognitive Computation.

[40]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[41]  Zheng Lian,et al.  Expression Analysis Based on Face Regions in Real-world Conditions , 2019, Int. J. Autom. Comput..

[42]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[43]  Luc Van Gool,et al.  Covariance Pooling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Mohammad Rahmati,et al.  Driver drowsiness detection using face expression recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[45]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[46]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[47]  Xiaoou Tang,et al.  From Facial Expression Recognition to Interpersonal Relation Prediction , 2016, International Journal of Computer Vision.

[48]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[49]  Mohammad H. Mahoor,et al.  Facial Expression Recognition from World Wild Web , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[52]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Ping Liu,et al.  Identity-Aware Convolutional Neural Network for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[54]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[55]  Marc'Aurelio Ranzato,et al.  Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.

[56]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[57]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[58]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.