PREDICTING STUDENTS' PERFORMANCE IN DISTANCE LEARNING USING MACHINE LEARNING TECHNIQUES

The ability to predict a student's performance could be useful in a great number of different ways associated with university-level distance learning. Students' key demographic characteristics and their marks on a few written assignments can constitute the training set for a supervised machine learning algorithm. The learning algorithm could then be able to predict the performance of new students, thus becoming a useful tool for identifying predicted poor performers. The scope of this work is to compare some of the state of the art learning algorithms. Two experiments have been conducted with six algorithms, which were trained using data sets provided by the Hellenic Open University. Among other significant conclusions, it was found that the Naïve Bayes algorithm is the most appropriate to be used for the construction of a software support tool, has more than satisfactory accuracy, its overall sensitivity is extremely satisfactory, and is the easiest algorithm to implement.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4]  Panayiotis E. Pintelas,et al.  A survey on student dropout rates and dropout causes concerning the students in the Course of Informatics of the Hellenic Open University , 2002, Comput. Educ..

[5]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[6]  I. D. Zaharakis,et al.  EFFICIENCY OF MACHINE LEARNING TECHNIQUES IN PREDICTING STUDENTS ’ PERFORMANCE IN DISTANCE LEARNING SYSTEMS , 2005 .

[7]  Lena Gaga,et al.  ID+: Enhancing Medical Knowledge Acquisition with Machine Learning , 1996, Appl. Artif. Intell..

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[11]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  L. Whittington,et al.  Factors Impacting on the Success of Distance Education Students of the University of the West Indies: A Review of the Literature. , 1995 .

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[16]  J. A. Calvin Regression Models for Categorical and Limited Dependent Variables , 1998 .

[17]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..