Student success prediction in MOOCs

Predictive models of student success in Massive Open Online Courses (MOOCs) are a critical component of effective content personalization and adaptive interventions. In this article we review the state of the art in predictive models of student success in MOOCs and present a categorization of MOOC research according to the predictors (features), prediction (outcomes), and underlying theoretical model. We critically survey work across each category, providing data on the raw data source, feature engineering, statistical model, evaluation method, prediction architecture, and other aspects of these experiments. Such a review is particularly useful given the rapid expansion of predictive modeling research in MOOCs since the emergence of major MOOC platforms in 2012. This survey reveals several key methodological gaps, which include extensive filtering of experimental subpopulations, ineffective student model evaluation, and the use of experimental data which would be unavailable for real-world student success prediction and intervention, which is the ultimate goal of such models. Finally, we highlight opportunities for future research, which include temporal modeling, research bridging predictive and explanatory student models, work which contributes to learning theory, and evaluating long-term learner success in MOOCs.

[1]  Nitesh V. Chawla,et al.  MOOC Dropout Prediction: Lessons Learned from Making Pipelines Interpretable , 2017, WWW.

[2]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[3]  Patrick Jermann,et al.  Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners , 2014, EMNLP 2014.

[4]  Zhou Qin Research Progress on Educational Data Mining:A Survey , 2015 .

[5]  René F. Kizilcec,et al.  Eight-minute self-regulation intervention raises educational attainment at scale in individualist but not collectivist cultures , 2017, Proceedings of the National Academy of Sciences.

[6]  J. Greene,et al.  Predictors of Retention and Achievement in a Massive Open Online Course , 2015 .

[7]  Ji-Hye Park,et al.  Factors Influencing Adult Learners' Decision to Drop Out or Persist in Online Learning , 2009, J. Educ. Technol. Soc..

[8]  Lise Getoor,et al.  Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic , 2013 .

[9]  Vincent Tinto,et al.  Research and Practice of Student Retention: What Next? , 2006 .

[10]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[11]  Carolyn Penstein Rosé,et al.  Exploring the Effect of Confusion in Discussion Forums of Massive Open Online Courses , 2015, L@S.

[12]  David E. Pritchard,et al.  Bringing student backgrounds online: MOOC user demographics, site usage, and online learning , 2013, EDM.

[13]  Zachary A. Pardos,et al.  Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommendation Framework , 2017, L@S.

[14]  Rebeca Cerezo,et al.  Predicting Students' Performance: Incremental Interaction Classifiers , 2016, L@S.

[15]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[16]  Kenneth R. Koedinger,et al.  Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a MOOC , 2015, L@S.

[17]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[18]  Patrick Jermann,et al.  Your click decides your fate: Inferring Information Processing and Attrition Behavior from MOOC Video Clickstream Interactions , 2014, Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs.

[19]  Dit-Yan Yeung,et al.  Temporal Models for Predicting Student Dropout in Massive Open Online Courses , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[20]  Marta E. Zorrilla,et al.  A meta-learning based framework for building algorithm recommenders: An application for educational arena , 2017, J. Intell. Fuzzy Syst..

[21]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[22]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[23]  George Siemens,et al.  Studying MOOC completion at scale using the MOOC replication framework , 2018, EDM.

[24]  Peter Brusilovsky,et al.  Stereotype Modeling for Problem-Solving Performance Predictions in MOOCs and Traditional Courses , 2017, UMAP.

[25]  Pedro M. Domingos Occam's Two Razors: The Sharp and the Blunt , 1998, KDD.

[26]  Joseph Jay Williams,et al.  HarvardX and MITx: Two Years of Open Online Courses Fall 2012-Summer 2014 , 2015 .

[27]  Jacob Whitehill,et al.  Delving Deeper into MOOC Student Dropout Prediction , 2017, ArXiv.

[28]  Bikram Sengupta,et al.  Student Emotion, Co-occurrence, and Dropout in a MOOC Context , 2016, EDM.

[29]  Justine Cassell,et al.  Connecting the Dots: Predicting Student Grade Sequences from Bursty MOOC Interactions over Time , 2015, L@S.

[30]  Gautam Biswas,et al.  Behavior Prediction in MOOCs using Higher Granularity Temporal Information , 2015, L@S.

[31]  Aditya Johri,et al.  Predicting Performance on MOOC Assessments using Multi-Regression Models , 2016, EDM.

[32]  S. Levinson,et al.  WEIRD languages have misled us, too , 2010, Behavioral and Brain Sciences.

[33]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[34]  Amy J. Wojciechowski,et al.  Individual Student Characteristics: Can Any Be Predictors Of Success In Online Classes? , 2005 .

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[37]  Joseph Jay Williams,et al.  Beyond Prediction: Towards Automatic Intervention in MOOC Student Stop-out , 2015, EDM.

[38]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[39]  Christian Gütl,et al.  Classifying students to improve MOOC dropout rates , 2016 .

[40]  T. Russo,et al.  Prestige, Centrality, and Learning: A Social Network Analysis of an Online Class , 2005 .

[41]  Katy Jordan,et al.  Initial trends in enrolment and completion of massive open online courses , 2014 .

[42]  Sotiris B. Kotsiantis,et al.  A combinational incremental ensemble of classifiers as a technique for predicting students' performance in distance education , 2010, Knowl. Based Syst..

[43]  Bin Xu,et al.  Motivation Classification and Grade Prediction for MOOCs Learners , 2016, Comput. Intell. Neurosci..

[44]  Mihaela Cocea,et al.  Cross-System Validation of Engagement Prediction from Log Files , 2007, EC-TEL.

[45]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[46]  Neil T. Heffernan,et al.  Population validity for educational data mining models: A case study in affect detection , 2014, Br. J. Educ. Technol..

[47]  Eduardo Gómez-Sánchez,et al.  Predicting the decrease of engagement indicators in a MOOC , 2017, LAK.

[48]  Yair Levy,et al.  Comparing dropouts and persistence in e-learning courses , 2007, Comput. Educ..

[49]  Pamela A. Dupin-Bryant Pre-entry Variables Related to Retention in Online Distance Education , 2004 .

[50]  Sebastián Ventura,et al.  Meta-learning Approach for Automatic Parameter Tuning: A case of study with educational datasets , 2012, EDM.

[51]  C. B. Colby The weirdest people in the world , 1973 .

[52]  Pedro J. Muñoz Merino,et al.  @Scale: Using Harvesting Accounts for Collecting Correct Answers , 2017 .

[53]  Gloria Allione,et al.  Mass attrition: An analysis of drop out from principles of microeconomics MOOC , 2016 .

[54]  Qian Zhang,et al.  Modeling and Predicting Learning Behavior in MOOCs , 2016, WSDM.

[55]  Mike Sharkey,et al.  Course correction: using analytics to predict course success , 2012, LAK '12.

[56]  D. Gašević,et al.  “Choose Your Classmates, Your GPA Is at Stake!” , 2013 .

[57]  Ariel Rokem,et al.  Assessing Reproducibility (In The Practice of Reproducible Research Case Studies and Lessons from the Data-Intensive Sciences Justin Kitzes, Daniel Turek, Fatma Deniz (Eds.)) , 2017 .

[58]  Kalyan Veeramachaneni,et al.  Towards Feature Engineering at Scale for Data from Massive Open Online Courses , 2014, ArXiv.

[59]  H. Vincent Poor,et al.  Mining MOOC Clickstreams: On the Relationship Between Learner Video-Watching Behavior and Performance , 2015, ArXiv.

[60]  D. Donoho 50 Years of Data Science , 2017 .

[61]  Xin Chen,et al.  Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization , 2016, Comput. Hum. Behav..

[62]  Arthur C. Graesser,et al.  Modeling Learners' Social Centrality and Performance through Language and Discourse , 2015, EDM.

[63]  Isaac L. Chuang,et al.  Teacher Enrollment in MITx MOOCs: Are We Educating Educators? , 2014 .

[64]  Mark Warschauer,et al.  Predicting MOOC performance with Week 1 Behavior , 2014, EDM.

[65]  Miguel Ángel Conde González,et al.  Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning , 2014, Comput. Hum. Behav..

[66]  Christopher Brooks,et al.  Toward Replicable Predictive Model Evaluation in MOOCs , 2017, EDM.

[67]  Giuseppe Riccardi,et al.  Predicting Student Progress from Peer-Assessment Data , 2016, EDM.

[68]  Irena Koprinska,et al.  Discrimination-Aware Classifiers for Student Performance Prediction , 2015, EDM.

[69]  Andrew Gelman,et al.  Why We (Usually) Don't Have to Worry About Multiple Comparisons , 2009, 0907.2478.

[70]  Scott D. Johnson,et al.  FACTORS THAT INFLUENCE STUDENTS’ DECISION TO DROPOUT OF ONLINE COURSES , 2019, Online Learning.

[71]  Gautam Biswas,et al.  Early Prediction of Student Dropout and Performance in MOOCs using Higher Granularity Temporal Information , 2014, J. Learn. Anal..

[72]  Mung Chiang,et al.  MOOC performance prediction via clickstream data and social learning networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[73]  Carolyn Penstein Rosé,et al.  Investigating How Student's Cognitive Behavior in MOOC Discussion Forum Affect Learning Gains , 2015, EDM.

[74]  Matthew C. Makel,et al.  Facts Are More Important Than Novelty , 2014 .

[75]  Li Chen,et al.  A Nonlinear State Space Model for Identifying At-Risk Students in Open Online Courses , 2016, EDM.

[76]  A. Gelman,et al.  The garden of forking paths : Why multiple comparisons can be a problem , even when there is no “ fishing expedition ” or “ p-hacking ” and the research hypothesis was posited ahead of time ∗ , 2019 .

[77]  Andrew J. Saltarelli,et al.  Who Takes MOOCs , 2016 .

[78]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[79]  Jennifer DeBoer,et al.  Tracking progress: predictors of students' weekly achievement during a circuits and electronics MOOC , 2014, L@S.

[80]  Xiang Xiao,et al.  AttentiveLearner: Adaptive Mobile MOOC Learning via Implicit Cognitive States Inference , 2015, ICMI.

[81]  Stephanie D. Teasley,et al.  Who You Are or What You Do: Comparing the Predictive Power of Demographics vs. Activity Patterns in Massive Open Online Courses (MOOCs) , 2015, L@S.

[82]  George Siemens,et al.  Replicating 21 findings on student success in online learning , 2017 .

[83]  Sebastián Ventura,et al.  A meta-learning approach for recommending a subset of white-box classification algorithms for Moodle datasets , 2013, EDM.

[84]  P. Prinsloo,et al.  Learning Analytics , 2013 .

[85]  Justin Cheng,et al.  Tools for predicting drop-off in large online classes , 2013, CSCW '13.

[86]  Errol Yudko,et al.  "Hits" (not "Discussion Posts") predict student success in online courses: A double cross-validation study , 2008, Comput. Educ..

[87]  Jingtao Wang,et al.  AttentiveLearner: Improving Mobile MOOC Learning via Implicit Heart Rate Tracking , 2015, AIED.

[88]  Linda Corrin,et al.  Predicting success: how learners' prior knowledge, skills and activities predict MOOC performance , 2015, LAK.

[89]  Dragan Gasevic,et al.  The Changing Patterns of MOOC Discourse , 2017, L@S.

[90]  Christian Gütl,et al.  MOOC Learner Behaviour: Attrition and Retention Analysis and Prediction Based on 11 Courses on the TELESCOPE Platform , 2017, LTEC@KMO.

[91]  E. Xing,et al.  Towards an Integration of Text and Graph Clustering Methods as a Lens for Studying Social Interaction in MOOCs , 2014 .

[92]  Kalyan Veeramachaneni,et al.  Transfer Learning for Predictive Models in Massive Open Online Courses , 2015, AIED.

[93]  Haoran Xie,et al.  A Big Data Framework for Early Identification of Dropout Students in MOOC , 2015 .

[94]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[95]  Christopher Brooks,et al.  Statistical Approaches to the Model Comparison Task in Learning Analytics , 2017, MLA/BLAC@LAK.

[96]  Yuan Wang,et al.  Demystifying Learner Success: Before, During, and After a Massive Open Online Course , 2017 .

[97]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[98]  Sebastián Ventura,et al.  Multi-instance genetic programming for predicting student performance in web based educational environments , 2012, Appl. Soft Comput..

[99]  Ryan Shaun Joazeiro de Baker,et al.  MORF: A Framework for MOOC Predictive Modeling and Replication At Scale , 2018, ArXiv.

[100]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[101]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[102]  Zachary A. Pardos,et al.  Affective states and state tests: investigating how affect throughout the school year predicts end of year learning outcomes , 2013, LAK '13.

[103]  David E. Pritchard,et al.  Correlating skill and improvement in 2 MOOCs with a student's time on tasks , 2014, L@S.

[104]  Martin Ebner,et al.  MOOCs Completion Rates and Possible Methods to Improve Retention - A Literature Review , 2014 .

[105]  Jihie Kim,et al.  Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks , 2015, AIED Workshops.

[106]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[107]  Hannah D. Street,et al.  Factors Influencing a Learner's Decision to Drop-Out or Persist in Higher Education Distance Learning , 2010 .

[108]  Katharina Reinecke,et al.  Demographic differences in how students navigate through MOOCs , 2014, L@S.

[109]  Justin Reich,et al.  Socioeconomic status and MOOC enrollment: enriching demographic information with external datasets , 2015, LAK.

[110]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[111]  Stephanie D. Teasley,et al.  A time series interaction analysis method for building predictive models of learners using log data , 2015, LAK.

[112]  Xin Chen,et al.  Corrigendum to "Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization" [Computers in Human Behavior 58 (2016) 119-129] , 2017, Comput. Hum. Behav..

[113]  Carolyn Penstein Rosé,et al.  Social factors that contribute to attrition in MOOCs , 2014, L@S.

[114]  Sherif Halawa,et al.  Attrition and Achievement Gaps in Online Learning , 2015, L@S.

[115]  Justin Reich,et al.  Forecasting student achievement in MOOCs with natural language processing , 2016, LAK.

[116]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[117]  Chao Li,et al.  Machine learning application in MOOCs: Dropout prediction , 2016, 2016 11th International Conference on Computer Science & Education (ICCSE).

[118]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[119]  Jihie Kim,et al.  SAP: Student Attrition Predictor , 2015, EDM.

[120]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[121]  Christian Gütl,et al.  MOOC Dropouts: A Multi-system Classifier , 2017, EC-TEL.

[122]  Hua Li,et al.  Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[123]  Panagiotis Adamopoulos,et al.  What makes a great MOOC? An interdisciplinary analysis of student retention in online courses , 2013, ICIS.

[124]  Ryan Shaun Joazeiro de Baker,et al.  Analyzing Early At-Risk Factors in Higher Education e-Learning Courses , 2015, EDM.

[125]  Robert F. Boruch,et al.  Moving Through MOOCs , 2014 .

[126]  Rachel B. Baker,et al.  Persistence Patterns in Massive Open Online Courses (MOOCs) , 2015 .

[127]  Mark Warschauer,et al.  Social Positioning and Performance in MOOCs , 2014, EDM.

[128]  Carolyn Penstein Rosé,et al.  Linguistic Reflections of Student Engagement in Massive Open Online Courses , 2014, ICWSM.

[129]  Ming Zhang,et al.  MOOC student dropout: pattern and prevention , 2017, ACM TUR-C '17.

[130]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[131]  George B. Garman A Logistic Approach To Predicting Student Success In Online Database Courses , 2010 .

[132]  Conrad S. Tucker,et al.  Mining Student-Generated Textual Data In MOOCS And Quantifying Their Effects on Student Performance and Learning Outcomes , 2014 .

[133]  Lior Rokach,et al.  Predicting Student Exam's Scores by Analyzing Social Network Data , 2012, AMT.

[134]  Christian Gütl,et al.  Attrition in MOOC: Lessons Learned from Drop-Out Students , 2014, LTEC@KMO.

[135]  Andrew D. Ho,et al.  Changing “Course” , 2014 .

[136]  Isaac L. Chuang,et al.  Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification , 2015, L@S.

[137]  Ting Wang,et al.  Exploring N-gram Features in Clickstream Data for MOOC Learning Achievement Prediction , 2017, DASFAA Workshops.

[138]  Danielle S. McNamara,et al.  Combining click-stream data with NLP tools to better understand MOOC completion , 2016, LAK.

[139]  Robert Sanders,et al.  A Process for Predicting MOOC Attrition , 2014, EMNLP 2014.

[140]  Christopher Brooks,et al.  Dropout Model Evaluation in MOOCs , 2018, AAAI.

[141]  Inès Saad,et al.  Weekly Predicting the At-Risk MOOC Learners Using Dominance-Based Rough Set Approach , 2017, EMOOCs.

[142]  Dragan Gasevic,et al.  Translating network position into performance: importance of centrality in different network configurations , 2016, LAK.

[143]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[144]  Huimin Wang,et al.  Grade Prediction in MOOCs , 2016, 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES).

[145]  Lise Getoor,et al.  Uncovering hidden engagement patterns for predicting learner performance in MOOCs , 2014, L@S.

[146]  Carolyn Penstein Rosé,et al.  Sentiment Analysis in MOOC Discussion Forums: What does it tell us? , 2014, EDM.

[147]  Giuseppe Riccardi,et al.  Predicting students' final exam scores from their course activities , 2015, 2015 IEEE Frontiers in Education Conference (FIE).

[148]  Sotiris B. Kotsiantis,et al.  Preventing Student Dropout in Distance Learning Using Machine Learning Techniques , 2003, KES.