Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

The goal of this paper is to describe methods for automatically extracting features for student modeling from educational data, and students’ interaction-log data in particular, by training deep neural networks with unsupervised training. Several different types of autoencoder networks and structures are discussed, including deep neural networks, recurrent neural networks, variational autoencoders, convolutional neural networks, and asymmetric network structures. Autoencoder networks are trained to find lowdimensional, predictive embeddings of raw interaction-log data. These embeddings are then entered into a model as features for supervised classification tasks. We discuss the implications for training these network structures with educational data, including peculiarities that arise for interaction-log data that are not as commonly encountered in domains such as computer vision and natural language processing. Methods for evaluating the network training process are also discussed, with examples showing the importance and efficacy of visualizing neuron activations to diagnose common problems encountered during training and verifying that embedded representations of data follow desired distributions. We provide an example of how automatically extracted features can be used in a classification problem for the detection of student affect. In this example, student boredom was detected at levels above chance (area under the receiver operating characteristic curve = .673 versus .5 chance). Finally, opportunities for future work are discussed, including transfer learning and semisupervised methods.

[1]  Jaclyn L. Ocumpaugh,et al.  I Feel Your Pain : A Selective Review of Affect-Sensitive Instructional Strategies , 2014 .

[2]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[3]  Neil T. Heffernan,et al.  Population validity for educational data mining models: A case study in affect detection , 2014, Br. J. Educ. Technol..

[4]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Ryan Shaun Joazeiro de Baker,et al.  A Comparison of Face-based and Interaction-based Affect Detectors in Physics Playground , 2015, EDM.

[6]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[7]  Wolfgang Rosenstiel,et al.  Using touchscreen interaction data to predict cognitive workload , 2016, ICMI.

[8]  Ryan Shaun Joazeiro de Baker,et al.  Towards Understanding Expert Coding of Student Disengagement in Online Learning , 2014, CogSci.

[9]  Dongmei Jiang,et al.  Audio Visual Emotion Recognition Based on Triple-Stream Dynamic Bayesian Network Models , 2011, ACII.

[10]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[11]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[12]  Markus H. Gross,et al.  Stealth Assessment in ITS - A Study for Developmental Dyscalculia , 2016, ITS.

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Ryan S. Baker,et al.  Interaction-Based Affect Detection in Educational Software , 2015 .

[15]  Joseph Jay Williams,et al.  Beyond Prediction: Towards Automatic Intervention in MOOC Student Stop-out , 2015, EDM.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[18]  Brandon G. King,et al.  Facial Features for Affective State Detection in Learning Environments , 2007 .

[19]  Jian Cheng,et al.  Using deep neural networks to improve proficiency assessment for children English language learners , 2014, INTERSPEECH.

[20]  Ryan Shaun Joazeiro de Baker,et al.  Using Video to Automatically Detect Learner Affect in Computer-Enabled Classrooms , 2016, TIIS.

[21]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[22]  Yanbo Xu,et al.  Using Logistic Regression to Trace Multiple Sub-skills in a Dynamic Bayes Net , 2011, EDM.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Omar AlZoubi,et al.  Affect Detection from Multichannel Physiology during Learning Sessions with AutoTutor , 2011, AIED.

[25]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[26]  Sidney K. D'Mello,et al.  Automated Physiological-Based Detection of Mind Wandering during Learning , 2014, Intelligent Tutoring Systems.

[27]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[28]  Zachary A. Pardos,et al.  Deep Neural Networks and How They Apply to Sequential Education Data , 2016, L@S.

[29]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[30]  Gautam Biswas,et al.  From Design to Implementation to Practice a Learning by Teaching System: Betty’s Brain , 2016, International Journal of Artificial Intelligence in Education.

[31]  Zhang Yi,et al.  Learning a good representation with unsymmetrical auto-encoder , 2015, Neural Computing and Applications.

[32]  James C. Lester,et al.  Early Prediction of Student Frustration , 2007, ACII.

[33]  Kuniaki Uehara,et al.  Semi-Supervised learning using adversarial networks , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[34]  Andrew Olney,et al.  Gaze tutor: A gaze-reactive intelligent tutoring system , 2012, Int. J. Hum. Comput. Stud..

[35]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[39]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[40]  Michael C. Mozer,et al.  How Deep is Knowledge Tracing? , 2016, EDM.