Systematic Ensemble Model Selection Approach for Educational Data Mining

Abstract A plethora of research has been done in the past focusing on predicting student’s performance in order to support their development. Many institutions are focused on improving the performance and the education quality; and this can be achieved by utilizing data mining techniques to analyze and predict students’ performance and to determine possible factors that may affect their final marks. To address this issue, this work starts by thoroughly exploring and analyzing two different datasets at two separate stages of course delivery (20% and 50% respectively) using multiple graphical, statistical, and quantitative techniques. The feature analysis provides insights into the nature of the different features considered and helps in the choice of the machine learning algorithms and their parameters. Furthermore, this work proposes a systematic approach based on Gini index and p -value to select a suitable ensemble learner from a combination of six potential machine learning algorithms. Experimental results show that the proposed ensemble models achieve high accuracy and low false positive rate at all stages for both datasets.

[1]  Nadine Meskens,et al.  Determination of factors influencing the achievement of the first-year university students using data mining methods , 2006 .

[2]  W. F. Punch,et al.  Predicting student performance: an application of data mining methods with an educational Web-based system , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[3]  Hanan Lutfiyya,et al.  Relationship Between Student Engagement and Performance in E-Learning Environment Using Association Rules , 2018, 2018 IEEE World Engineering Education Conference (EDUNINE).

[4]  John Daniel,et al.  The Future of MOOCs: Adaptive Learning or Business Model? , 2015, International Journal of Educational Technology in Higher Education.

[5]  Abdallah Shami,et al.  Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[6]  Abdallah Shami,et al.  Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[7]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Mohamed Jemni,et al.  Generalized metrics for the analysis of E-learning personalization strategies , 2015, Comput. Hum. Behav..

[10]  Erik Duval,et al.  Social Software for Life-long Learning , 2007, J. Educ. Technol. Soc..

[11]  Stephen H. Edwards,et al.  Introducing CodeWorkout: an adaptive and social learning environment (abstract only) , 2014, SIGCSE '14.

[12]  Marc J. Rosenberg,et al.  E-Learning: Strategies for Delivering Knowledge in the Digital Age , 2000 .

[13]  Nadine Meskens,et al.  Predicting Academic Performance by Data Mining Methods , 2007 .

[14]  Abdul Rahim Ahmad,et al.  Tracking Student Performance in Introductory Programming by Means of Machine Learning , 2019, 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC).

[15]  Michael A. Herbert Staying the Course: A Study in Online Student Satisfaction and Retention , 2006 .

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[17]  Zainal Abidin,et al.  MINING STUDENTS' ACADEMIC PERFORMANCE , 2013 .

[18]  Kaaren Blom,et al.  Quality indicators in vocational education and training: international perspectives , 2003 .

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Sarah K. Howard,et al.  Usage profiling from mobile applications: A case study of online activity for Australian primary schools , 2020, Knowl. Based Syst..

[21]  Boran Sekeroglu,et al.  Student Performance Prediction and Classification Using Machine Learning Algorithms , 2019, Proceedings of the 2019 8th International Conference on Educational and Information Technology.

[22]  Fatos Xhafa,et al.  A Review on Massive E-Learning (MOOC) Design, Delivery and Assessment , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[23]  Sotiris B. Kotsiantis,et al.  PREDICTING STUDENTS' PERFORMANCE IN DISTANCE LEARNING USING MACHINE LEARNING TECHNIQUES , 2004, Appl. Artif. Intell..

[24]  Umesh Kumar Pandey,et al.  Data Mining : A prediction of performer or underperformer using classification , 2011, ArXiv.

[25]  Philip S. Yu,et al.  Targeting the right students using data mining , 2000, KDD '00.

[26]  Chengqi Zhang,et al.  An efficient and simple under-sampling technique for imbalanced time series classification , 2012, CIKM.

[27]  B. Wujek,et al.  Automated Hyperparameter Tuning for Effective Machine Learning , 2017 .

[28]  Richard L. Smith,et al.  PREDICTIVE INFERENCE , 2004 .

[29]  Zachary A. Pardos,et al.  Using Fine-Grained Skill Models to Fit Student Performance with , 2006 .

[30]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[31]  Meng Wang,et al.  Generative Adversarial Active Learning for Unsupervised Outlier Detection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[32]  Zhendong Niu,et al.  An e-learning recommendation approach based on the self-organization of learning resource , 2018, Knowl. Based Syst..

[33]  Brett E. Shelton,et al.  Improving Predictive Modeling for At-Risk Student Identification: A Multistage Approach , 2019, IEEE Transactions on Learning Technologies.

[34]  V. Ramesh,et al.  Predicting Student Performance: A Statistical and Data Mining Approach , 2013 .

[35]  Davide Anguita,et al.  A Learning Analytics Approach to Correlate the Academic Achievements of Students with Interaction Data from an Educational Simulator , 2015, EC-TEL.

[36]  Benjamin Kehrwald,et al.  Understanding social presence in text‐based online learning environments , 2008 .

[37]  A. Shami,et al.  Student Engagement Level in an e-Learning Environment: Clustering Using K-means , 2020 .

[38]  Hanan Lutfiyya,et al.  E-Learning: Challenges and Research Opportunities Using Machine Learning & Data Analytics , 2018, IEEE Access.

[39]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[40]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[41]  Surjeet Kumar Yadav,et al.  Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification , 2012, ArXiv.

[42]  Mihaela van der Schaar,et al.  A Machine Learning Approach for Tracking and Predicting Student Performance in Degree Programs , 2017, IEEE Journal of Selected Topics in Signal Processing.

[43]  Kaja Zupanc,et al.  Automated essay evaluation with semantic analysis , 2017, Knowl. Based Syst..

[44]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[45]  Dorina Kabakchieva,et al.  Student Performance Prediction by Using Data Mining Classification Algorithms , 2012 .

[46]  Zlatko J. Kovacic,et al.  Early Prediction of Student Success: Mining Students Enrolment Data , 2010 .

[47]  R. Bhaskaran,et al.  A CHAID Based Performance Prediction Model in Educational Data Mining , 2010, ArXiv.

[48]  Surjeet Kumar Yadav,et al.  Data Mining Applications: A comparative Study for Predicting Student's performance , 2012, ArXiv.

[49]  Jing Luan,et al.  Data Mining and Its Applications in Higher Education , 2002 .

[50]  Shane Dawson,et al.  Predicting academic performance by considering student heterogeneity , 2018, Knowl. Based Syst..

[51]  Tengyu Ma,et al.  CS229 Lecture notes , 2007 .

[52]  Hanan Lutfiyya,et al.  DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).