Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network

Engagement is the holy grail of learning whether it is in a classroom setting or an online learning platform. Studies have shown that engagement of the student while learning can benefit students as well as the teacher if the engagement level of the student is known. It is difficult to keep track of the engagement of each student in a face-to-face learning happening in a large classroom. It is even more difficult in an online learning platform where, the user is accessing the material at different instances. Automatic analysis of the engagement of students can help to better understand the state of the student in a classroom setting as well as online learning platforms and is more scalable. In this paper we propose a framework that uses Temporal Convolutional Network (TCN) to understand the intensity of engagement of students attending video material from Massive Open Online Courses (MOOCs). The input to the TCN network is the statistical features computed on 10 second segments of the video from the gaze, head pose and action unit intensities available in OpenFace library. The ability of the TCN architecture to capture long term dependencies gives it the ability to outperform other sequential models like LSTMs. On the given test set in the EmotiW 2018 sub challenge-"Engagement in the Wild", the proposed approach with Dilated-TCN achieved an average mean square error of 0.079.

[1]  Kaamran Raahemifar,et al.  Vision-based engagement detection in Virtual Reality , 2016, 2016 Digital Media Industry & Academic Forum (DMIAF).

[2]  Renate Fruchter,et al.  Engagement Detection in Meetings , 2016, ArXiv.

[3]  Jane Sinclair,et al.  Dropout rates of massive open online courses : behavioural patterns , 2014 .

[4]  Mohamed Chetouani,et al.  Engagement detection based on mutli-party cues for human robot interaction , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[5]  Reed W. Larson,et al.  Boredom in the Middle School Years: Blaming Schools versus Blaming Students , 1991, American Journal of Education.

[6]  Abhay Gupta,et al.  DAiSEE: Towards User Engagement Recognition in the Wild. , 2016, 1609.01885.

[7]  T. Banta,et al.  Change : The Magazine of Higher Learning , 2010 .

[8]  Abhinav Dhall,et al.  Prediction and Localization of Student Engagement in the Wild , 2018, 2018 Digital Image Computing: Techniques and Applications (DICTA).

[9]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[10]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Mirko Raca,et al.  Camera-based estimation of student's attention in class , 2015 .

[13]  Tamás D. Gedeon,et al.  EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction , 2018, ICMI.

[14]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[15]  R. Axelson,et al.  Defining Student Engagement , 2010 .

[16]  Benjamin C. Heddy,et al.  The Challenges of Defining and Measuring Student Engagement in Science , 2015 .

[17]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Stefan Scherer,et al.  Assessing Public Speaking Ability from Thin Slices of Behavior , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[19]  C. Darwin The Expression of the Emotions in Man and Animals , .

[20]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[22]  Hong Lu,et al.  Teaching Video Analytics Based on Student Spatial and Temporal Behavior Mining , 2015, ICMR.

[23]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Dinesh Babu Jayagopi,et al.  Predicting student engagement in classrooms using facial behavioral cues , 2017, MIE@ICMI.