MOOC Dropout Prediction: How to Measure Accuracy?

In order to obtain reliable accuracy estimates for automatic MOOC dropout predictors, it is important to train and test them in a manner consistent with how they will be used in practice. Yet most prior research on MOOC dropout prediction has measured test accuracy on the same course used for training, which can lead to overly optimistic accuracy estimates. In order to understand better how accuracy is affected by the training+testing regime, we compared the accuracy of a standard dropout prediction architecture (clickstream features + logistic regression) across 4 different training paradigms. Results suggest that (1) training and testing on the same course ("post-hoc") can significantly overestimate accuracy. Moreover, (2) training dropout classifiers using proxy labels based on students' persistence -- which are available before a MOOC finishes -- is surprisingly competitive with post-hoc training (87.33% v.~90.20% AUC averaged over 8 weeks of 40 HarvardX MOOCs) and can support real-time MOOC interventions.

[1]  Xin Chen,et al.  Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization , 2016, Comput. Hum. Behav..

[2]  Dit-Yan Yeung,et al.  Temporal Models for Predicting Student Dropout in Massive Open Online Courses , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[3]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[4]  Joseph Jay Williams,et al.  Beyond Prediction: Towards Automatic Intervention in MOOC Student Stop-out , 2015, EDM.

[5]  Mark Warschauer,et al.  Predicting MOOC performance with Week 1 Behavior , 2014, EDM.

[6]  Gautam Biswas,et al.  Early Prediction of Student Dropout and Performance in MOOCs using Higher Granularity Temporal Information , 2014, J. Learn. Anal..

[7]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[8]  Danielle S. McNamara,et al.  Combining click-stream data with NLP tools to better understand MOOC completion , 2016, LAK.

[9]  Gloria Allione,et al.  Mass attrition: An analysis of drop out from principles of microeconomics MOOC , 2016 .

[10]  Jacob Whitehill,et al.  Delving Deeper into MOOC Student Dropout Prediction , 2017, ArXiv.

[11]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[12]  Kalyan Veeramachaneni,et al.  Transfer Learning for Predictive Models in Massive Open Online Courses , 2015, AIED.

[13]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[14]  Kenneth R. Koedinger,et al.  Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a MOOC , 2015, L@S.

[15]  Carolyn Penstein Rosé,et al.  Social factors that contribute to attrition in MOOCs , 2014, L@S.

[16]  Sherif Halawa,et al.  Attrition and Achievement Gaps in Online Learning , 2015, L@S.

[17]  Justin Reich,et al.  Forecasting student achievement in MOOCs with natural language processing , 2016, LAK.

[18]  Isaac L. Chuang,et al.  Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification , 2015, L@S.

[19]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[20]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.