Estimating student dropout in distance higher education using semi-supervised techniques

Nowadays, distance higher education has rapidly increased due to advance and integration of information and communications' technology. Students who attend online distance courses have often family obligations and job commitments and are usually in 'high risk' of dropout during their attendance. It is of a highly importance to identify such students, through paying extra attention and support to them could possibly minimize the possibility of student failure or even dropout. The present research intends to study whether semi-supervised techniques could be useful in student dropout prediction in distance higher education. Semi-supervised learning aims to generate reliable predictions using few labeled and many unlabeled data. Labeled data are difficult obtainable quite often, as they require many experts, a lot of human effort and time in experiments. As far as, we are aware in several studies propose and compare supervised methods for students' dropout prediction rates in higher education, but none of them investigates the effectiveness of semi-supervised methods. The results of our experiments reveal that a good predictive accuracy can be achieved using few labeled data in comparison to well known supervised learning algorithms. For that purpose we have developed a web-based tool to estimate if an individual student is going to dropout.

[1]  P. K. Biswas,et al.  A Study of Student Attrition and Completion of Distance Education Programmes of IGNOU , 2010 .

[2]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[3]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Shieu-Hong Lin Data mining for student retention management , 2012 .

[8]  Laurence G Moseley,et al.  Predicting who will drop out of nursing courses: a machine learning exercise. , 2008, Nurse education today.

[9]  Friedhelm Schwenker,et al.  Combining Committee-Based Semi-Supervised Learning and Active Learning , 2010, Journal of Computer Science and Technology.

[10]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[12]  Serge Herzog,et al.  Estimating Student Retention and Degree-Completion Time: Decision Trees and Neural Networks Vis-a-Vis Regression. , 2006 .

[13]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[14]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[15]  Max Welling,et al.  A First Encounter with Machine Learning , 2010 .

[16]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[17]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  G. Michailidis,et al.  An Iterative Algorithm for Extending Learners to a Semi-Supervised Setting , 2008 .

[19]  Rohit Jha,et al.  Predicting Students' Performance Using ID3 And C4.5 Classification Algorithms , 2013, ArXiv.

[20]  Siwei Luo,et al.  A random subspace method for co-training , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[22]  Ormond Simpson,et al.  Predicting student success in open and distance learning , 2006 .

[23]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[24]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[25]  Sotiris B. Kotsiantis,et al.  Preventing Student Dropout in Distance Learning Using Machine Learning Techniques , 2003, KES.

[26]  Chao Deng,et al.  Tri-training and Data Editing Based Semi-supervised Clustering Algorithm , 2006, MICAI.

[27]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[28]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[29]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[30]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[31]  Chris Panagiotakopoulos,et al.  Student Dropout at the Hellenic Open University: Evaluation of the Graduate Program, "Studies in Education" , 2002 .