Attention Network for Engagement Prediction in the Wild

Analysis of the student engagement in an e-learning environment would facilitate effective task accomplishment and learning. Generally, engagement/disengagement can be estimated from facial expressions, body movements and gaze pattern. The focus of this Ph.D. work is to explore automatic student engagement assessment while watching Massive Open Online Courses (MOOCs) video material in the real-world environment. Most of the work till now in this area has been focusing on engagement assessment in lab-controlled environments. There are several challenges involved in moving from lab-controlled environments to real-world scenarios such as face tracking, illumination, occlusion, and context. The early work in this Ph.D. project explores the student engagement while watching MOOCs. The unavailability of any publicly available dataset in the domain of user engagement motivates to collect dataset in this direction. The dataset contains 195 videos captured from 78 subjects which are about 16.5 hours of recording. This dataset is independently annotated by different labelers and final label is derived from the statistical analysis of the individual labels given by the different annotators. Various traditional machine learning algorithm and deep learning based networks are used to derive baseline of the dataset. Engagement prediction and localization are modeled as Multi-Instance Learning (MIL) problem. In this work, the importance of Hierarchical Attention Network (HAN) is studied. This architecture is motivated from the hierarchical nature of the problem where a video is made up of segments and segments are made up of frames.

[1]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[2]  Robert A. Sottilare,et al.  Predicting Learner Engagement during Well-Defined and Ill-Defined Computer-Based Intercultural Interactions , 2011, ACII.

[3]  Arthur C. Graesser,et al.  Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive-affective states during interactions with three different computer-based learning environments , 2010, Int. J. Hum. Comput. Stud..

[4]  Marian Stewart Bartlett,et al.  Weakly supervised pain localization using multiple instance learning , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[5]  Kristy Elizabeth Boyer,et al.  Multimodal analysis of the implicit affective channel in computer-mediated textual communication , 2012, ICMI '12.

[6]  Abhinav Dhall,et al.  Prediction and Localization of Student Engagement in the Wild , 2018, 2018 Digital Image Computing: Techniques and Applications (DICTA).

[7]  Arthur C. Graesser,et al.  Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features , 2010, User Modeling and User-Adapted Interaction.

[8]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[9]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Angela L. Duckworth,et al.  Advanced, Analytic, Automated (AAA) Measurement of Engagement During Learning , 2017, Educational psychologist.

[11]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gwen Littlewort,et al.  Automated measurement of children's facial expressions during problem solving tasks , 2011, Face and Gesture 2011.

[13]  Vincent Aleven,et al.  Towards Sensor-Free Affect Detection in Cognitive Tutor Algebra. , 2012, EDM 2012.

[14]  Xiaohui Xie,et al.  Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification , 2016, bioRxiv.

[15]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiang Xiao,et al.  Undertanding and Detecting Divided Attention in Mobile MOOC Learning , 2017, CHI.

[17]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Claude Frasson,et al.  Affect and Mental Engagement: Towards Adaptability for Intelligent , 2010, FLAIRS.