Implementation of learning analytics framework for MOOCs using state-of-the-art in-memory computing

MOOC aims at delivering online courses to tens of thousands to millions of heterogeneous learners at the same time, with minimal or no charge. It provides an alternate way to disseminate quality education to the section of people who cannot reach premier institutions. It has great potential to overcome the barriers of traditional learning systems. However, there are several challenges in MOOCs such as huge drop-out rates, improper automated assessments, varied student engagement, and attention etc. Learning Analytics helps us to contain such issues. Learning analytics, with the help of Big Data Technologies, helps us to interpret humongous MOOCs data to assess progress, predict performance and identify problems. To perform analytics, we developed a workflow using Apache Spark, a scalable inmemory computing framework. The data from edX platform has been used for experiments. It contains the information of more than 2 Lakh students from 39 courses. Initially, detailed statistical analysis has been carried out to understand the learning patterns and the behavior of online learners. Later, we have developed drop-out prediction models using various machine learning algorithms such as Random Forest, Gradient Boost, and Logistic Regression. A stacked ensemble model is developed and performance comparison with baseline models is carried out. It outperformed all other models with an accuracy of 91.2%.

[1]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[2]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  David E. Pritchard,et al.  Studying Learning in the Worldwide Classroom Research into edX's First MOOC. , 2013 .

[5]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[7]  Han Yu,et al.  A Survey on Artificial Intelligence and Data Mining for MOOCs , 2016, ArXiv.

[8]  Jeffrey R. Wilson,et al.  Short History of the Logistic Regression Model , 2015 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Yvonne Belanger,et al.  Bioelectricity: A Quantitative Approach Duke University’s First MOOC , 2013 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  George Siemens,et al.  Penetrating the fog: analytics in learning and education , 2014 .

[14]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[15]  Rita Kop,et al.  The Challenges to Connectivist Learning on Open Online Networks: Learning Experiences during a Massive Open Online Course , 2011 .