Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization

Massive open online courses (MOOCs) have recently taken center stage in discussions surrounding online education, both in terms of their potential as well as their high dropout rates. The high attrition rates associated with MOOCs have often been described in terms of a scale-efficacy tradeoff. Building from the large numbers associated with MOOCs and the ability to track individual student performance, this study takes an initial step towards a mechanism for the early and accurate identification of students at risk for dropping out. Focusing on struggling students who remain active in course discussion forums and who are already more likely to finish a course, we design a temporal modeling approach, one which prioritizes the at-risk students in order of their likelihood to drop out of a course. In identifying only a small subset of at-risk students, we seek to provide systematic insight for instructors so they may better provide targeted support for those students most in need of intervention. Moreover, we proffer appending historical features to the current week of features for model building and to introduce principle component analysis in order to identify the breakpoint for turning off the features of previous weeks. This appended modeling method is shown to outperform simpler temporal models which simply sum features. To deal with the kind of data variability presented by MOOCs, this study illustrates the effectiveness of an ensemble stacking generalization approach to build more robust and accurate prediction models than the direct application of base learners. Propose a temporal modeling approach for students' dropout behavior in MOOCs.Demonstrate the advantage of appended feature modeling space based on PCA over a summed features modeling space.Explore the power of the ensemble learning method (stacking generalization) in enhancing the prediction ability.

[1]  Sean P. Goggins,et al.  Learning analytics in outer space: a Hidden Naïve Bayes model for automatic student off-task behavior detection , 2015, LAK.

[2]  Lise Getoor,et al.  Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic , 2013 .

[3]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Carolyn Penstein Rosé,et al.  Peer Influence on Attrition in Massively Open Online Courses , 2014, EDM.

[8]  Rui Guo,et al.  Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory , 2015, Comput. Hum. Behav..

[9]  Sean P. Goggins,et al.  Building models explaining student participation behavior in asynchronous online discussion , 2016, Comput. Educ..

[10]  Sean P. Goggins,et al.  Learning analytics in CSCL with a focus on assessment: an exploratory study of activity theory-informed cluster analysis , 2014, LAK.

[11]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[12]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.

[13]  Sean P. Goggins,et al.  Modeling Performance in Asynchronous CSCL: An Exploration of Social Ability, Collective Efficacy and Social Interaction , 2015, CSCL.

[14]  Carolyn Penstein Rosé,et al.  Social factors that contribute to attrition in MOOCs , 2014, L@S.

[15]  Stacey Greenwell,et al.  7 Things You Should Know About the Modern Learning Commons , 2011 .

[16]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[17]  Xin Chen,et al.  Learning Analytics at "Small" Scale: Exploring a Complexity-Grounded Model for Assessment Automation , 2015, J. Univers. Comput. Sci..

[18]  Carolyn Penstein Rosé,et al.  Sentiment Analysis in MOOC Discussion Forums: What does it tell us? , 2014, EDM.

[19]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[20]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[21]  Allison Littlejohn,et al.  Instructional quality of Massive Open Online Courses (MOOCs) , 2015, Comput. Educ..

[22]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[23]  Sean P. Goggins,et al.  Group Learning Assessment: Developing a Theory-Informed Analytics , 2015, J. Educ. Technol. Soc..

[24]  Zhenming Liu,et al.  Learning about Social Learning in MOOCs: From Statistical Analysis to Generative Model , 2013, IEEE Transactions on Learning Technologies.

[25]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[26]  Carlos Delgado Kloos,et al.  Precise Effectiveness Strategy for analyzing the effectiveness of students with educational resources and activities in MOOCs , 2015, Comput. Hum. Behav..

[27]  Christian Gütl,et al.  Attrition in MOOC: Lessons Learned from Drop-Out Students , 2014, LTEC@KMO.

[28]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[29]  Hangjung Zo,et al.  Understanding the MOOCs continuance: The role of openness and reputation , 2015, Comput. Educ..

[30]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[31]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[32]  Jane Sinclair,et al.  Dropout rates of massive open online courses : behavioural patterns , 2014 .

[33]  George Siemens,et al.  Learning analytics and educational data mining: towards communication and collaboration , 2012, LAK.

[34]  Xin Chen,et al.  "Twitter Archeology" of learning analytics and knowledge conferences , 2015, LAK.

[35]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  Mark Warschauer,et al.  Predicting MOOC performance with Week 1 Behavior , 2014, EDM.

[37]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.