An overview and comparison of supervised data mining techniques for student exam performance prediction

Abstract Recent increase in the availability of learning data has given educational data mining an importance and momentum, in order to better understand and optimize the learning process and environments in which it occurs. The aim of this paper is to provide a comprehensive analysis and comparison of state of the art supervised machine learning techniques applied for solving the task of student exam performance prediction, i.e. discovering students at a “high risk” of dropping out from the course, and predicting their future achievements, such as for instance, the final exam scores. For both classification and regression tasks, the overall highest precision was obtained with artificial neural networks by feeding the student engagement data and past performance data, while the usage of demographic data did not show significant influence on the precision of predictions. To exploit the full potential of the student exam performance prediction, it was concluded that adequate data acquisition functionalities and the student interaction with the learning environment is a prerequisite to ensure sufficient amount of data for analysis.

[1]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Mansureh Kebritchi,et al.  Learning Analytics Methods, Benefits, and Challenges in Higher Education: A Systematic Literature Review. , 2016 .

[4]  J. Hahm,et al.  The Big (Data) Bang: Policy, Prospects, and Challenges , 2014 .

[5]  Dorian A. Canelas,et al.  Understanding the massive open online course (MOOC) student experience: An examination of attitudes, motivations, and barriers , 2017, Comput. Educ..

[6]  W. F. Punch,et al.  Predicting student performance: an application of data mining methods with an educational Web-based system , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[7]  Edmon Begoli,et al.  Understanding the pros and cons of big data analytics. , 2014, Physician executive.

[8]  H. Seal Studies in the history of probability and statistics. XV. The historical velopment of the Gauss linear model. , 1967, Biometrika.

[9]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[10]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[11]  Charles Oppenheim,et al.  Legal, Risk and Ethical Aspects of Analytics in Higher Education , 2012 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Elizabeth A Hartnell-Young,et al.  Comparative analysis of the impact of traditional versus innovative learning environment on student attitudes and learning outcomes , 2018, Studies in Educational Evaluation.

[14]  Olga Viberg,et al.  The current landscape of learning analytics in higher education , 2018, Comput. Hum. Behav..

[15]  Lu Ding,et al.  An exploratory study of student engagement in gamified online discussions , 2018, Comput. Educ..

[16]  Jahangir Karimi,et al.  Student engagement in course-based social networks: The impact of instructor credibility and use of communication , 2015, Comput. Educ..

[17]  Janet E. Hurn,et al.  Using learning analytics to predict (and improve) student success: a faculty perspective , 2013 .

[18]  Anthony G. Picciano The Evolution of Big Data and Learning Analytics in American Higher Education , 2012 .

[19]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[20]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[21]  Hannu Toivonen,et al.  Predicting and preventing student failure - using the k-nearest neighbour method to predict student performance in an online course environment , 2010, Int. J. Learn. Technol..

[22]  P. Prinsloo,et al.  Learning Analytics , 2013 .

[23]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[24]  Sander M. Bohte,et al.  Editorial: Artificial Neural Networks as Models of Neural Information Processing , 2017, Front. Comput. Neurosci..

[25]  P. Costa,et al.  Revised NEO Personality Inventory (NEO-PI-R) and NEO-Five-Factor Inventory (NEO-FFI) , 1992 .

[26]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[27]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[28]  Lars Schmidt-Thieme,et al.  Matrix and Tensor Factorization for Predicting Student Performance , 2011, CSEDU.

[29]  Laurie P. Dringus,et al.  Learning Analytics Considered Harmful. , 2012 .

[30]  Lars Schmidt-Thieme,et al.  Factorization Techniques for Predicting Student Performance , 2012 .

[31]  Michal Jakubczyk,et al.  A framework for sensitivity analysis of decision trees , 2017, Central European Journal of Operations Research.

[32]  Dragan Gasevic,et al.  Open Learning Analytics: an integrated modularized platform , 2011 .

[33]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[34]  Anthony G. Picciano Big Data and Learning Analytics in Blended Learning Environments: Benefits and Concerns , 2014, Int. J. Interact. Multim. Artif. Intell..

[35]  A. Dobson An introduction to generalized linear models , 1990 .

[36]  Jie Zhang,et al.  Can MOOCs be interesting to students? An experimental investigation from regulatory focus perspective , 2016, Comput. Educ..

[37]  Erik Duval,et al.  Dataset-Driven Research to Support Learning and Knowledge Analytics , 2012, J. Educ. Technol. Soc..

[38]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[39]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[40]  L. R. Goldberg The structure of phenotypic personality traits. , 1993, The American psychologist.