Delving Deeper into MOOC Student Dropout Prediction

In order to obtain reliable accuracy estimates for automatic MOOC dropout predictors, it is important to train and test them in a manner consistent with how they will be used in practice. Yet most prior research on MOOC dropout prediction has measured test accuracy on the same course used for training the classifier, which can lead to overly optimistic accuracy estimates. In order to understand better how accuracy is affected by the training+testing regime, we compared the accuracy of a standard dropout prediction architecture (clickstream features + logistic regression) across 4 different training paradigms. Results suggest that (1) training and testing on the same course ("post-hoc") can overestimate accuracy by several percentage points; (2) dropout classifiers trained on proxy labels based on students' persistence are surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance does not vary significantly with the academic discipline. Finally, we also research new dropout prediction architectures based on deep, fully-connected, feed-forward neural networks and find that (4) networks with as many as 5 hidden layers can statistically significantly increase test accuracy over that of logistic regression.

[1]  Xin Chen,et al.  Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization , 2016, Comput. Hum. Behav..

[2]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[3]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[4]  Eva Schiorring,et al.  Case study: using MOOCs for conventional college coursework , 2014 .

[5]  Gloria Allione,et al.  Mass attrition: An analysis of drop out from principles of microeconomics MOOC , 2016 .

[6]  Justin Reich,et al.  HarvardX and MITx: The First Year of Open Online Courses, Fall 2012-Summer 2013 , 2014 .

[7]  Kenneth R. Koedinger,et al.  Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a MOOC , 2015, L@S.

[8]  Carolyn Penstein Rosé,et al.  Social factors that contribute to attrition in MOOCs , 2014, L@S.

[9]  Sherif Halawa,et al.  Attrition and Achievement Gaps in Online Learning , 2015, L@S.

[10]  Justin Reich,et al.  Forecasting student achievement in MOOCs with natural language processing , 2016, LAK.

[11]  Danielle S. McNamara,et al.  Combining click-stream data with NLP tools to better understand MOOC completion , 2016, LAK.

[12]  Kalyan Veeramachaneni,et al.  Transfer Learning for Predictive Models in Massive Open Online Courses , 2015, AIED.

[13]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[14]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[15]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[16]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[17]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[18]  Isaac L. Chuang,et al.  Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification , 2015, L@S.

[19]  Mark Warschauer,et al.  Predicting MOOC performance with Week 1 Behavior , 2014, EDM.

[20]  Gautam Biswas,et al.  Early Prediction of Student Dropout and Performance in MOOCs using Higher Granularity Temporal Information , 2014, J. Learn. Anal..

[21]  Dit-Yan Yeung,et al.  Temporal Models for Predicting Student Dropout in Massive Open Online Courses , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).