An empirical comparison of models for dropout prophecy in MOOCs

MOOCs are Massive Open Online Courses, which are offered on web and have become a focal point for students preferring e-learning. Regardless of enormous enrollment of students in MOOCs, the amount of dropout students in these courses are too high. For the success of MOOCs, their dropout rates must decrease. As the proportion of continuing and dropout students in MOOCs varies considerably, the class imbalance problem has been observed in normally all MOOCs dataset. Researchers have developed models to predict the dropout students in MOOCs using different techniques. The features, which affect these models, can be obtained during registration and interaction of students with MOOCs' portal. Using results of these models, appropriate actions can be taken for students in order to retain them. In this paper, we have created four models using various machine learning techniques over publically available dataset. After the empirical analysis and evaluation of these models, we found that model created by Naïve Bayes technique performed well for imbalance class data of MOOCs.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  Haoran Xie,et al.  A Big Data Framework for Early Identification of Dropout Students in MOOC , 2015 .

[3]  Kalyan Veeramachaneni,et al.  Towards Feature Engineering at Scale for Data from Massive Open Online Courses , 2014, ArXiv.

[4]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[5]  Xin Chen,et al.  Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization , 2016, Comput. Hum. Behav..

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Rodica Potolea,et al.  Imbalanced Classification Problems: Systematic Study, Issues and Best Practices , 2011, ICEIS.

[8]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[9]  Tina R. Patil,et al.  Performance Analysis of Naive Bayes and J 48 Classification Algorithm for Data Classification , 2013 .

[10]  Sebastián Ventura,et al.  Educational data science in massive open online courses , 2016, WIREs Data Mining Knowl. Discov..

[11]  Kinshuk,et al.  Predicting Dropout-Prone Students in E-Learning Education System , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[12]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[13]  Fatos Xhafa,et al.  A Review on Massive E-Learning (MOOC) Design, Delivery and Assessment , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[14]  Jian Yang,et al.  Big Data Application in Education: Dropout Prediction in Edx MOOCs , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[15]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[16]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[17]  Cheng G. Weng,et al.  A New Evaluation Measure for Imbalanced Datasets , 2008, AusDM.

[18]  Arpit Singh,et al.  A Survey on Methods for Solving Data Imbalance Problem for Classification , 2015 .

[19]  Hua Li,et al.  Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[20]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[21]  Jeffrey R. Wilson,et al.  Introduction to Binary Logistic Regression , 2015 .