Forecasting student achievement in MOOCs with natural language processing

Student intention and motivation are among the strongest predictors of persistence and completion in Massive Open Online Courses (MOOCs), but these factors are typically measured through fixed-response items that constrain student expression. We use natural language processing techniques to evaluate whether text analysis of open responses questions about motivation and utility value can offer additional capacity to predict persistence and completion over and above information obtained from fixed-response items. Compared to simple benchmarks based on demographics, we find that a machine learning prediction model can learn from unstructured text to predict which students will complete an online course. We show that the model performs well out-of-sample, compared to a standard array of demographics. These results demonstrate the potential for natural language processing to contribute to predicting student success in MOOCs and other forms of open online learning.

[1]  Chris S. Hulleman,et al.  Task values, achievement goals, and interest: An integrative analysis. , 2008 .

[2]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[3]  Carolyn Penstein Rosé,et al.  Sentiment Analysis in MOOC Discussion Forums: What does it tell us? , 2014, EDM.

[4]  Matt Taddy,et al.  Multinomial Inverse Regression for Text Analysis , 2010, 1012.2098.

[5]  Timothy D. Wilson,et al.  Prospection: Experiencing the Future , 2007, Science.

[6]  Arthur C. Graesser,et al.  How do you connect?: analysis of social capital accumulation in connectivist MOOCs , 2015, LAK.

[7]  Justin Reich,et al.  HarvardX and MITx: The First Year of Open Online Courses, Fall 2012-Summer 2013 , 2014 .

[8]  Mimi Bong,et al.  Role of Self-Efficacy and Task-Value in Predicting College Students' Course Performance and Future Enrollment Intentions. , 2001, Contemporary educational psychology.

[9]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[10]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[11]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[12]  J. Harackiewicz,et al.  Enhancing interest and performance with a utility value intervention. , 2010 .

[13]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[14]  Joseph Jay Williams,et al.  Beyond Prediction: Towards Automatic Intervention in MOOC Student Stop-out , 2015, EDM.

[15]  Stephanie D. Teasley,et al.  A time series interaction analysis method for building predictive models of learners using log data , 2015, LAK.

[16]  Justin Reich,et al.  Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses , 2014, J. Learn. Anal..

[17]  J. Greene,et al.  Predictors of Retention and Achievement in a Massive Open Online Course , 2015 .

[18]  Sherif Halawa,et al.  Attrition and Achievement Gaps in Online Learning , 2015, L@S.

[19]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[20]  Joseph Jay Williams,et al.  HarvardX and MITx: Two Years of Open Online Courses Fall 2012-Summer 2014 , 2015 .

[21]  Jacquelynne S. Eccles,et al.  Motivation to succeed. , 1998 .