Multi-split optimized bagging ensemble model selection for multi-class educational data mining

Predicting students’ academic performance has been a research area of interest in recent years, with many institutions focusing on improving the students’ performance and the education quality. The analysis and prediction of students’ performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students’ final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students’ performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms’ parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

[1]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[2]  V. K. Dhar,et al.  Comparative performance of some popular artificial neural network algorithms on benchmark and function approximation problems , 2009 .

[3]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  Mohamed Jemni,et al.  Generalized metrics for the analysis of E-learning personalization strategies , 2015, Comput. Hum. Behav..

[5]  Peter Bühlmann,et al.  Bagging, Boosting and Ensemble Methods , 2012 .

[6]  Lloyd Feldmann Designing Homework Assignments: From Theory To Design , 1999 .

[7]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[8]  Carlos Márquez-Vera,et al.  Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[9]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[10]  Diane J. Hanson,et al.  E-Learning: Strategies for Delivering Knowledge in the Digital Age , 2003, J. Educ. Technol. Soc..

[11]  Abdallah Shami,et al.  Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[12]  John Daniel,et al.  The Future of MOOCs: Adaptive Learning or Business Model? , 2015, International Journal of Educational Technology in Higher Education.

[13]  Abdallah Shami,et al.  Systematic Ensemble Model Selection Approach for Educational Data Mining , 2020, Knowl. Based Syst..

[14]  Erik Duval,et al.  Social Software for Life-long Learning , 2007, J. Educ. Technol. Soc..

[15]  Stephen H. Edwards,et al.  Introducing CodeWorkout: an adaptive and social learning environment (abstract only) , 2014, SIGCSE '14.

[16]  Marc J. Rosenberg,et al.  E-Learning: Strategies for Delivering Knowledge in the Digital Age , 2000 .

[17]  E. Fernández,et al.  Finding Optimal Neural Network Architecture Using Genetic Algorithms , 2007 .

[18]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[19]  Yuanyuan Zhang,et al.  E-learning recommendation framework based on deep learning , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[20]  Philip S. Yu,et al.  Targeting the right students using data mining , 2000, KDD '00.

[21]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[22]  Isabelle Guyon,et al.  Design and analysis of the KDD cup 2009: fast scoring on a large orange customer database , 2009, SKDD.

[23]  M Ramaswami,et al.  Validating Predictive Performance of Classifier Models for Multiclass Problem in Educational Data Mining , 2014 .

[24]  NassifAli Bou,et al.  Data mining techniques in social media , 2016 .

[25]  Bashir Khan,et al.  Final Grade Prediction of Secondary School Student using Decision Tree , 2015 .

[26]  S. Yitzhaki,et al.  A note on the calculation and interpretation of the Gini index , 1984 .

[27]  Ritika Saxena Educational Data Mining: Performance Evaluation of Decision Tree and Clustering Techniques Using WEKA Platform , 2015 .

[28]  Abdulkadir Karaci,et al.  Intelligent tutoring system model based on fuzzy logic and constraint-based student model , 2019, Neural Computing and Applications.

[29]  Doina Precup,et al.  Assessing the Predictability of Hospital Readmission Using Machine Learning , 2013, IAAI.

[30]  Fatos Xhafa,et al.  A Review on Massive E-Learning (MOOC) Design, Delivery and Assessment , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[31]  Suhas S Athani,et al.  Student performance predictor using multiclass support vector classification algorithm , 2017, 2017 International Conference on Signal Processing and Communication (ICSPC).

[32]  B. Reiser,et al.  Estimation of the Youden Index and its Associated Cutoff Point , 2005, Biometrical journal. Biometrische Zeitschrift.

[33]  Hanan Lutfiyya,et al.  E-Learning: Challenges and Research Opportunities Using Machine Learning & Data Analytics , 2018, IEEE Access.

[34]  Tijana Vujicic,et al.  Comparative Analysis of Methods for Determining Number of Hidden Neurons in Artificial Neural Network , 2016 .

[35]  Hamido Fujita,et al.  Neural-fuzzy with representative sets for prediction of student performance , 2018, Applied Intelligence.

[36]  Mohamed Jemni,et al.  Automatic Recommendations for E-Learning Personalization Based on Web Usage Mining Techniques and Information Retrieval , 2008, 2008 Eighth IEEE International Conference on Advanced Learning Technologies.

[37]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[38]  G. Naga Raja Prasad,et al.  Mining Previous Marks Data to Predict Students Performance in Their Final Year Examinations , 2013 .

[39]  Saurabh Pal,et al.  Data Mining: A prediction for performance improvement using classification , 2012, ArXiv.

[40]  Wilton W.T. Fok,et al.  Prediction model for students' future development by deep learning and tensorflow artificial intelligence engine , 2018, 2018 4th International Conference on Information Management (ICIM).

[41]  B. Wujek,et al.  Automated Hyperparameter Tuning for Effective Machine Learning , 2017 .

[42]  Xin Chen,et al.  Mining Social Media Data for Understanding Students’ Learning Experiences , 2014, IEEE Transactions on Learning Technologies.

[43]  Jing Luan,et al.  Data Mining and Its Applications in Higher Education , 2002 .

[44]  García-Martínez Finding Optimal Neural Network Architecture Using Genetic Algorithms , 2007 .

[45]  Mojisola G. Asogbon,et al.  A Multi-class Support Vector Machine Approach for Students Academic Performance Prediction , 2016 .

[46]  Dongpu Cao,et al.  Levenberg–Marquardt Backpropagation Training of Multilayer Neural Networks for State Estimation of a Safety-Critical Cyber-Physical System , 2018, IEEE Transactions on Industrial Informatics.

[47]  Neha Mehra,et al.  Survey on Multiclass Classification Methods , 2013 .

[48]  Abdallah Shami,et al.  Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection , 2020, 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[49]  Shano Solanki,et al.  An Efficient Approach for Multiclass Student Performance Prediction based upon Machine Learning , 2019, 2019 International Conference on Communication and Electronics Systems (ICCES).

[50]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[51]  Zainal Abidin,et al.  MINING STUDENTS' ACADEMIC PERFORMANCE , 2013 .

[52]  Sotiris B. Kotsiantis,et al.  A combinational incremental ensemble of classifiers as a technique for predicting students' performance in distance education , 2010, Knowl. Based Syst..

[53]  Davide Anguita,et al.  A Learning Analytics Approach to Correlate the Academic Achievements of Students with Interaction Data from an Educational Simulator , 2015, EC-TEL.

[54]  Benjamin Kehrwald,et al.  Understanding social presence in text‐based online learning environments , 2008 .

[55]  Chih-Ping Chu,et al.  A learning style classification mechanism for e-learning , 2009, Comput. Educ..

[56]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[57]  Hanan Lutfiyya,et al.  DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[58]  Saurabh Pal,et al.  Mining Educational Data to Reduce Dropout Rates of Engineering Students , 2012 .

[59]  Nasri Harb,et al.  Factors Affecting Students' Performance , 2006 .

[60]  Hanan Lutfiyya,et al.  Relationship Between Student Engagement and Performance in E-Learning Environment Using Association Rules , 2018, 2018 IEEE World Engineering Education Conference (EDUNINE).

[61]  Abdallah Shami,et al.  Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[62]  Charles H. Barrows,et al.  The effect of induction of aldolase on the livers of senescent mice , 2006, AGE.

[63]  A. Shami,et al.  Student Engagement Level in an e-Learning Environment: Clustering Using K-means , 2020 .

[64]  Saurabh Pal,et al.  Mining Educational Data to Analyze Students' Performance , 2012, ArXiv.

[65]  Ali Bou Nassif,et al.  Data mining techniques in social media: A survey , 2016, Neurocomputing.

[66]  Abeer Badr El Din Ahmed,et al.  Data Mining: A prediction for Student's Performance Using Classification Method , 2014 .