Label quality in AffectNet: results of crowd-based re-annotation

AffectNet is one of the most popular resources for facial expression recognition (FER) on relatively unconstrained in-the-wild images. Given that images were annotated by only one annotator with limited consistency checks on the data, however, label quality and consistency may be limited. Here, we take a similar approach to a study that re-labeled another, smaller dataset (FER2013) with crowd-based annotations, and report results from a re-labeling and re-annotation of a subset of difficult AffectNet faces with 13 people on both expression label, and valence and arousal ratings. Our results show that human labels overall have medium to good consistency, whereas human ratings especially for valence are in excellent agreement. Importantly, however, crowd-based labels are significantly shifting towards neutral and happy categories and crowd-based affective ratings form a consistent pattern different from the original ratings. ResNets fully trained on the original AffectNet dataset do not predict human voting patterns, but when weakly-trained do so much better, particularly for valence. Our results have important ramifications for label quality in affective computing.

[1]  H. Bülthoff,et al.  The contribution of different facial regions to the recognition of conversational expressions. , 2008, Journal of vision.

[2]  Jonas Mueller,et al.  Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks , 2021, NeurIPS Datasets and Benchmarks.

[3]  Christian Wallraven,et al.  The semantic space for facial communication , 2014, Comput. Animat. Virtual Worlds.

[4]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[5]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[6]  Aleix M. Martinez,et al.  Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements , 2019, Psychological science in the public interest : a journal of the American Psychological Society.

[7]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[8]  Andrey V. Savchenko,et al.  Facial expression and attributes recognition based on multi-task learning of lightweight neural networks , 2021, 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY).

[9]  Yu Qiao,et al.  Frame Attention Networks for Facial Expression Recognition in Videos , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[10]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[11]  C. Wallraven,et al.  Facial Expression Processing Is Not Affected by Parkinson’s Disease, but by Age-Related Factors , 2019, Front. Psychol..

[12]  Hyung-Jeong Yang,et al.  Pyramid With Super Resolution for In-the-Wild Facial Expression Recognition , 2020, IEEE Access.

[13]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[14]  J. Russell Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. , 1994, Psychological bulletin.

[15]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[16]  P. Lang The emotion probe. Studies of motivation and attention. , 1995, The American psychologist.

[17]  Ahmed Mostafa,et al.  Facial Expressions Recognition Via CNNCraft-net for Static RGB Images , 2021, International Journal of Intelligent Engineering and Systems.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[20]  Emad Barsoum,et al.  Training deep networks for facial expression recognition with crowd-sourced label distribution , 2016, ICMI.

[21]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[22]  Jun Du,et al.  Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition , 2019, ICMI.