Temporal Models for Predicting Student Dropout in Massive Open Online Courses

Over the past few years, the rapid emergence of massive open online courses (MOOCs) has sparked a great deal of research interest in MOOC data analytics. Dropout prediction, or identifying students at risk of dropping out of a course, is an important problem to study due to the high attrition rate commonly found on many MOOC platforms. The methods proposed recently for dropout prediction apply relatively simple machine learning methods like support vector machines and logistic regression, using features that reflect such student activities as lecture video watching and forum activities on a MOOC platform during the study period of a course. Since the features are captured continuously for each student over a period of time, dropout prediction is essentially a time series prediction problem. By regarding dropout prediction as a sequence classification problem, we propose some temporal models for solving it. In particular, based on extensive experiments conducted on two MOOCs offered on Coursera and edX, a recurrent neural network (RNN) model with long short-term memory (LSTM) cells beats the baseline methods as well as our other proposed methods by a large margin.

[1]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Barbara Hammer,et al.  On the approximation capability of recurrent neural networks , 2000, Neurocomputing.

[5]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Lise Getoor,et al.  Learning Latent Engagement Patterns of Students in Online Courses , 2014, AAAI.

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[9]  Klaus Obermayer,et al.  Fast model-based protein homology detection without alignment , 2007, Bioinform..

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[12]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[14]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[15]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[16]  Geraldo Zimbrão,et al.  Evaluating Performance and Dropouts of Undergraduates Using Educational Data Mining , 2014 .

[17]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[18]  Patrick Jermann,et al.  Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners , 2014, EMNLP 2014.

[19]  Stefan C. Kremer,et al.  Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[20]  Suma Bhat,et al.  Predicting Attrition Along the Way: The UIUC Model , 2014, EMNLP 2014.

[21]  Björn W. Schuller,et al.  Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..

[22]  Stephen Grossberg,et al.  Recurrent neural networks , 2013, Scholarpedia.

[23]  Jürgen Schmidhuber,et al.  Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.

[24]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[25]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[27]  Jan C. Willems,et al.  From time series to linear system - Part I. Finite dimensional linear time invariant systems , 1986, Autom..

[28]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Yunsong Guo,et al.  Comparisons of sequence labeling algorithms and extensions , 2007, ICML '07.

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[34]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[35]  Robert Sanders,et al.  A Process for Predicting MOOC Attrition , 2014, EMNLP 2014.