Machine learning approaches to predict learning outcomes in Massive open online courses

With the rapid advancements in technology, Massive Open Online Courses (MOOCs) have become the most popular form of online educational delivery, largely due to the removal of geographical and financial barriers for participants. A large number of learners globally enrol in such courses. Despite the flexible accessibility, results indicate that the completion rate is quite low. Educational Data Mining and Learning Analytics are emerging fields of research that aim to enhance the delivery of education through the application of various statistical and machine learning approaches. An extensive literature survey indicates that no significant research is available within the area of MOOC data analysis, in particular considering the behavioural patterns of users. In this paper, therefore, two sets of features, based on learner behavioural patterns, were compared in terms of their suitability for predicting the course outcome of learners participating in MOOCs. Our Exploratory Data Analysis demonstrates that there is strong correlation between click stream actions and successful learner outcomes. Various Machine Learning algorithms have been applied to enhance the accuracy of classifier models. Simulation results from our investigation have shown that Random Forest achieved viable performance for our prediction problem, obtaining the highest performance of the models tested. Conversely, Linear Discriminant Analysis achieved the lowest relative performance, though represented only a marginal reduction in performance relative to the Random Forest.

[1]  Lise Getoor,et al.  Learning Latent Engagement Patterns of Students in Online Courses , 2014, AAAI.

[2]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[3]  Joseph Jay Williams,et al.  HarvardX and MITx: Two Years of Open Online Courses Fall 2012-Summer 2014 , 2015 .

[4]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[5]  Carolyn Penstein Rosé,et al.  Linguistic Reflections of Student Engagement in Massive Open Online Courses , 2014, ICWSM.

[6]  Zoran Zdravev,et al.  Big data for education data mining, data analytics and web dashboards , 2015 .

[7]  Carolyn Penstein Rosé,et al.  Learning analytics and machine learning , 2014, LAK.

[8]  Shafaatunnur Hasan,et al.  Student behavior analysis using self-organizing map clustering technique , 2015 .

[9]  Justin Reich,et al.  HarvardX and MITx: The First Year of Open Online Courses, Fall 2012-Summer 2013 , 2014 .

[10]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[11]  Zane L. Berge,et al.  Learning analytics as a tool for closing the assessment loop in higher education , 2012 .

[12]  Jinan Fiaidhi,et al.  The Next Step for Learning Analytics , 2014, IT Prof..

[13]  Ryan S. Baker,et al.  Educational Data Mining and Learning Analytics , 2014 .

[14]  George Siemens,et al.  The Cambridge Handbook of the Learning Sciences: Educational Data Mining and Learning Analytics , 2014 .

[15]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[16]  Jihie Kim,et al.  Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks , 2015, AIED Workshops.

[17]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Qian Zhang,et al.  Modeling and Predicting Learning Behavior in MOOCs , 2016, WSDM.

[19]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[20]  Chorng-Shyong Ong,et al.  Factors affecting engineers' acceptance of asynchronous e-learning systems in high-tech companies , 2004, Inf. Manag..