Forecasting Students' Performance Using an Ensemble SSL Algorithm

Educational data mining is a growing academic research area which aims to gain significant insights on student behavior, interactions and performance by applying data mining methods on educational data. During the last decades, a variety of accurate models has been developed to monitor students’ future progress, while most of these studies are based on supervised classification methods. In this work, we propose an ensemble semi-supervised algorithm for the prediction of students’ performance in the final examinations at the end of academic year. The experimental results demonstrate the efficiency and robustness of the proposed algorithm compared to some classical classification algorithms, in terms of accuracy.

[1]  Carlos Márquez-Vera,et al.  Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[2]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[3]  Panayiotis E. Pintelas,et al.  DSS-PSP - A Decision Support Software for Evaluating Students' Performance , 2017, EANN.

[4]  Christopher J. Merz,et al.  Combining Classifiers Using Correspondence Analysis , 1997, NIPS.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[9]  I. Arroyo,et al.  Bayesian networks and linear regression models of students’ goals, moods, and emotions , 2010 .

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[12]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[13]  Mykola Pechenizkiy,et al.  Handbook of Educational Data Mining , 2010 .

[14]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[15]  Tassos A. Mikropoulos,et al.  Predicting Secondary School Students' Performance Utilizing a Semi-supervised Learning Approach , 2019 .

[16]  Zhi-Hua Zhou When semi-supervised learning meets ensemble learning , 2011 .

[17]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Tassos A. Mikropoulos,et al.  A decision support system for predicting students’ performance , 2016 .

[19]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[20]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21]  Jun Du,et al.  When Does Cotraining Work in Real Data? , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Lior Rokach,et al.  Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.

[23]  Alejandro Peña-Ayala Review: Educational data mining: A survey and a data mining-based analysis of recent works , 2014 .

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[26]  Sotiris B. Kotsiantis,et al.  Estimating student dropout in distance higher education using semi-supervised techniques , 2015, Panhellenic Conference on Informatics.

[27]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[28]  R. Bhaskaran,et al.  A CHAID Based Performance Prediction Model in Educational Data Mining , 2010, ArXiv.

[29]  Pong C. Yuen,et al.  A Boosted Co-Training Algorithm for Human Action Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Tao Guo,et al.  Improved Tri-training with Unlabeled Data , 2012 .

[31]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[32]  Georgios Kostopoulos,et al.  Enhancing high school students' performance based on semi-supervised methods , 2017, 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA).

[33]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[34]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[35]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[36]  Ryan S. Baker,et al.  Educational Data Mining and Learning Analytics , 2014 .

[37]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[38]  Claire Cardie,et al.  Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[39]  Sotiris B. Kotsiantis Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades , 2011, Artificial Intelligence Review.

[40]  Shiliang Sun,et al.  Robust Co-Training , 2011, Int. J. Pattern Recognit. Artif. Intell..