Early dropout prediction in distance higher education using active learning

Students' dropout prediction in higher education is an important and challenging research topic for universities. The successful implementation of a distance learning course is fundamental for educational institutions, for this reason the reduction of dropout rates is of vital importance. Although the use of machine learning methods in the educational field is relatively new, significant studies have been presented in recent years dealing with the dropout phenomenon. These studies point out several factors influencing the successful course completion, while indicating the complexity and difficulty of accurate early dropout prediction. The main purpose of this research is to investigate the efficiency of active learning methodologies to predict students' dropout rates in a distance web-based course in a timely manner. Active learning is a typical of methods trying to effectively use unlabeled data along with a small amount of labeled ones. A plethora of experiments are conducted using a variety of active learners indicating that an early prediction of high-risk students can be obtained.

[1]  Anastasios A. Economides,et al.  Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence , 2014, J. Educ. Technol. Soc..

[2]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[3]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[4]  Robert C. Holte,et al.  Decision Tree Instability and Active Learning , 2007, ECML.

[5]  Sotiris B. Kotsiantis,et al.  Estimating student dropout in distance higher education using semi-supervised techniques , 2015, Panhellenic Conference on Informatics.

[6]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[8]  Dina Tsagari Contact Sessions in Distance Education: Students’ Perspective , 2014 .

[9]  Mykola Pechenizkiy,et al.  Predicting Students Drop Out: A Case Study , 2009, EDM.

[10]  Zhi-Hua Zhou Learning with unlabeled data and its application to image retrieval , 2006 .

[11]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[12]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[13]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[14]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[15]  Xin Chen,et al.  Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization , 2016, Comput. Hum. Behav..

[16]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[17]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[18]  James Bailey,et al.  Identifying At-Risk Students in Massive Open Online Courses , 2015, AAAI.

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[20]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[21]  Sotiris B. Kotsiantis,et al.  Preventing Student Dropout in Distance Learning Using Machine Learning Techniques , 2003, KES.

[22]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[23]  Lubos Popelínský,et al.  Predicting drop-out from social behaviour of students , 2012, EDM.

[24]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[26]  Erman Yukselturk,et al.  Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program , 2014 .

[27]  Niels Pinkwart,et al.  Predicting MOOC Dropout over Weeks Using Machine Learning Methods , 2014, EMNLP 2014.

[28]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Yan Leng,et al.  Combining active learning and semi-supervised learning to construct SVM classifier , 2013, Knowl. Based Syst..

[33]  Habib Fardoun,et al.  JCLAL: A Java Framework for Active Learning , 2016, J. Mach. Learn. Res..

[34]  Ormond Simpson,et al.  Predicting student success in open and distance learning , 2006 .