Predicting Students’ Academic Performance and Main Behavioral Features Using Data Mining Techniques

Creating learning environments, where students, parents, and teachers are linked to a learning process, helps study their overall impact on the students’ performance. Data mining can analyze these inter-relationships and thus enable the prediction of academic performance to improve the student’s academic level. The main factors that affect the student’s performance were selected using feature selection methods. An analysis of the crucial features was investigated to better understand the data. One of the main outcomes found is the impact of the behavioral features on the students’ academic performance. Moreover, gender and relation demographical features are another important features found. It was evedent that there is an academic disparity between genders, as females constitute the most outstanding students. Furthermore, mothers have a clear role in student academic excellence. Six machine learning methods were used and tested to predict the studnet’s performance, namely random forest, logistic regression, XGBoost, MLP, and ensemble learning using bagging and voting. Of all the methods, the random forest got the highest accuracy with 10-best selected features that reached 77%. Overfitting was addressed successfully by tuning the hyper-parameters. The results show that data mining can accurately predict the students’ performance level, as well as highlight the most influential features.

[1]  Nicolas Huck,et al.  Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 , 2017, Eur. J. Oper. Res..

[2]  R. Holder,et al.  Standardisation of rates using logistic regression: a comparison with the direct method , 2008, BMC health services research.

[3]  Sandro Sperandei,et al.  Understanding logistic regression analysis , 2014, Biochemia medica.

[4]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[5]  Rosa Alarcón,et al.  Centralized student performance prediction in large courses based on low-cost variables in an institutional context , 2018, Internet High. Educ..

[6]  Ibrahim Aljarah,et al.  Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods , 2016 .

[7]  Wu Zhang,et al.  Using machine learning to predict student difficulties from learning session data , 2018, Artificial Intelligence Review.

[8]  Yuhanis Yusof,et al.  LSSVM parameters tuning with enhanced artificial bee colony , 2014, Int. Arab J. Inf. Technol..

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[12]  Katrina Sin,et al.  Application of Big Data in Education Data Mining and Learning Analytics-A Literature Review , 2015, SOCO 2015.

[13]  Baldoino Fonseca dos Santos Neto,et al.  Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses , 2017, Comput. Hum. Behav..

[14]  Pablo Lucero,et al.  Feature engineering based on ANOVA, cluster validity assessment and KNN for fault diagnosis in bearings , 2018, J. Intell. Fuzzy Syst..

[15]  R. Schumacker,et al.  Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues , 2009 .

[16]  Jae Young Chung,et al.  Dropout early warning systems for high school students using machine learning , 2019, Children and Youth Services Review.

[17]  M. Kayri An Intelligent Approach to Educational Data: Performance Comparison of the Multilayer Perceptron and the Radial Basis Function Artificial Neural Networks , 2015 .

[18]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[19]  B Poblador,et al.  Pharmacy cost outliers in primary care; multilevel approach based on ACG in the Spanish context , 2008, BMC Health Services Research.

[20]  Seiji Isotani,et al.  Educational Data Mining: A review of evaluation process in the e-learning , 2018, Telematics Informatics.

[21]  David McLean,et al.  Profiling Student Learning Styles with Multilayer Perceptron Neural Networks , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.