Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

This paper presents our approach for the engagement intensity regression task of EmotiW 2019. The task is to predict the engagement intensity value of a student when he or she is watching an online MOOCs video in various conditions. Based on our winner solution last year, we mainly explore head features and body features with a bootstrap strategy and two novel loss functions in this paper. We maintain the framework of multi-instance learning with long short-term memory (LSTM) network, and make three contributions. First, besides of the gaze and head pose features, we explore facial landmark features in our framework. Second, inspired by the fact that engagement intensity can be ranked in values, we design a rank loss as a regularization which enforces a distance margin between the features of distant category pairs and adjacent category pairs. Third, we use the classical bootstrap aggregation method to perform model ensemble which randomly samples a certain training data by several times and then averages the model predictions. We evaluate the performance of our method and discuss the influence of each part on the validation dataset. Our methods finally win 3rd place with MSE of 0.0626 on the testing set. https://github.com/kaiwang960112/EmotiW_2019_ engagement_regression

[1]  Kai Wang,et al.  Group emotion recognition with individual facial emotion CNNs and global image based CNNs , 2017, ICMI.

[2]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[3]  Bert Bredeweg,et al.  Proceedings of the 2005 conference on Artificial Intelligence in Education: Supporting Learning through Intelligent and Socially Informed Technology , 2005 .

[4]  K D'MelloSidney,et al.  Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features , 2010 .

[5]  Abhinav Dhall,et al.  Prediction and Localization of Student Engagement in the Wild , 2018, 2018 Digital Image Computing: Techniques and Applications (DICTA).

[6]  Chang-Dong Wang,et al.  Online visual tracking via correlation filter with convolutional networks , 2016, 2016 Visual Communications and Image Processing (VCIP).

[7]  Kai Wang,et al.  Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction , 2018, ICMI.

[8]  Vincent Aleven,et al.  Intelligent Tutoring Goes To School in the Big City , 1997 .

[9]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[10]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ryan Shaun Joazeiro de Baker,et al.  Detecting Student Emotions in Computer-Enabled Classrooms , 2016, IJCAI.

[12]  Cheng Lu,et al.  Bi-modality Fusion for Emotion Recognition in the Wild , 2019, ICMI.

[13]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[14]  Yu Qiao,et al.  Frame Attention Networks for Facial Expression Recognition in Videos , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[15]  Robert A. Sottilare,et al.  Predicting Learner Engagement during Well-Defined and Ill-Defined Computer-Based Intercultural Interactions , 2011, ACII.

[16]  Xiang Xiao,et al.  Dynamics of Affective States During MOOC Learning , 2017, AIED.

[17]  Tamás D. Gedeon,et al.  EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction , 2018, ICMI.

[18]  Arthur C. Graesser,et al.  Multimethod assessment of affective experience and expression during deep learning , 2009, Int. J. Learn. Technol..

[19]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[20]  Jennifer A. Fredricks,et al.  School Engagement: Potential of the Concept, State of the Evidence , 2004 .

[21]  Joseph E. Beck,et al.  Engagement tracing: using response times to model student disengagement , 2005, AIED.

[22]  Shiguang Shan,et al.  Automatic Engagement Prediction with GAP Feature , 2018, ICMI.

[23]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Kai Wang,et al.  Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues , 2018, ICMI.

[25]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[26]  Jianfei Yang,et al.  Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction , 2019, ICMI.

[27]  Julie A. Gray,et al.  The Effects of Student Engagement, Student Satisfaction, and Perceived Learning in Online Learning Environments , 2016 .

[28]  Jaehong Kim,et al.  Automatic Recognition of Children Engagement from Facial Video Using Convolutional Neural Networks , 2020, IEEE Transactions on Affective Computing.