ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning

In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the data. We also employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions with reliability balancing, which is a strategy that leverages anchor points, attention scores, and confidence values to enhance the resilience of label predictions. To ensure correct label classification and improve the models' discriminative power, we introduce anchor loss, which encourages large margins between anchor points. Additionally, the multi-head self-attention mechanism, which is also trainable, plays an integral role in identifying accurate labels. This approach provides critical elements for improving the reliability of predictions and has a substantial positive effect on final prediction capabilities. Our adaptive model can be integrated with any deep neural network to forestall challenges in various recognition tasks. Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.

[1]  Alan S. Cowen,et al.  ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Yuan Chang,et al.  POSTER V2: A simpler and stronger facial expression recognition network , 2023, ArXiv.

[3]  Quang D. Tran,et al.  Uncertainty-aware Label Distribution Learning for Facial Expression Recognition , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[4]  Yibing Zhan,et al.  Expression Snippet Transformer for Robust Video-based Facial Expression Recognition , 2021, Pattern Recognit..

[5]  Tao Wang,et al.  Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition , 2021, Biomimetics.

[6]  J. Yang,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zunlei Feng,et al.  Mid-level Representation Enhancement and Graph Embedded Uncertainty Suppressing for Facial Expression Recognition , 2022, ArXiv.

[8]  ByoungChul Ko,et al.  Facial Expression Recognition Based on Squeeze Vision Transformer , 2022, Sensors.

[9]  Mat'ias Mendieta,et al.  POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition , 2022, ArXiv.

[10]  Soyeon Kim,et al.  Vision Transformer Equipped With Neural Resizer On Facial Expression Recognition Task , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Y. Zhu,et al.  Coarse-to-Fine Cascaded Networks with Smooth Predicting for Video Facial Expression Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Hao Zeng,et al.  Transformer-based Multimodal Information Fusion for Facial Expression Analysis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Jiarui Bi,et al.  Transformer in Computer Vision , 2021, 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI).

[14]  Feng Zhao,et al.  MFEViT: A Robust Lightweight Transformer-based Network for Multimodal 2D+3D Facial Expression Recognition , 2021, ArXiv.

[15]  Guodong Guo,et al.  TransFER: Learning Relation-aware Facial Expression Representations with Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Michael J. Lyons "Excavating AI" Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset , 2021, SSRN Electronic Journal.

[17]  H. Xiong,et al.  Face.evoLVe: A High-Performance Face Recognition Library , 2021, Neurocomputing.

[18]  Qingshan Liu,et al.  Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild , 2021, IEEE Transactions on Image Processing.

[19]  Zhengjun Zha,et al.  MViT: Mask Vision Transformer for Facial Expression Recognition in the wild , 2021, ArXiv.

[20]  Fahad Shahbaz Khan,et al.  Intriguing Properties of Vision Transformers , 2021, NeurIPS.

[21]  Feng Zhou,et al.  Robust Lightweight Facial Expression Recognition Network with Label Distribution Training , 2021, AAAI.

[22]  Z. Chai,et al.  Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tao Mei,et al.  Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Zhen Lei,et al.  Attentive Hybrid Feature with Two-Step Fusion for Facial Expression Recognition , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[25]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[26]  Weihong Deng,et al.  Relative Uncertainty Learning for Facial Expression Recognition , 2021, NeurIPS.

[27]  Si Chen,et al.  Deep Disturbance-Disentangled Learning for Facial Expression Recognition , 2020, ACM Multimedia.

[28]  Yu Qiao,et al.  Learning Discriminative Representation For Facial Expression Recognition From Uncertainties , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[29]  Michael J. Lyons,et al.  Coding Facial Expressions with Gabor Wavelets (IVC Special Issue) , 2020, ArXiv.

[30]  Zhongchao Shi,et al.  Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jianfei Yang,et al.  Suppressing Uncertainties for Large-Scale Facial Expression Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[33]  Hyung-Jeong Yang,et al.  Pyramid With Super Resolution for In-the-Wild Facial Expression Recognition , 2020, IEEE Access.

[34]  Haifeng Hu,et al.  Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition , 2019, Pattern Recognit..

[35]  Shiguang Shan,et al.  Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism , 2019, IEEE Transactions on Image Processing.

[36]  Jianfei Cai,et al.  Facial Motion Prior Networks for Facial Expression Recognition , 2019, 2019 IEEE Visual Communications and Image Processing (VCIP).

[37]  Ing Ren Tsang,et al.  FERAtt: Facial Expression Recognition With Attention Net , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Shan Li,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[39]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Qi Li,et al.  Deep spatial-temporal feature fusion for facial expression recognition in static images , 2017, Pattern Recognit. Lett..

[41]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[42]  Yang Liu,et al.  MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices , 2018, CCBR.

[43]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[44]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[45]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[48]  Linda G. Shapiro,et al.  Modeling Stylized Character Expressions via Deep Learning , 2016, ACCV.

[49]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[50]  Emad Barsoum,et al.  Training deep networks for facial expression recognition with crowd-sourced label distribution , 2016, ICMI.

[51]  Yong Tao,et al.  Compound facial expressions of emotion , 2014, Proceedings of the National Academy of Sciences.

[52]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[53]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[54]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[55]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .