Implementation of optimum binning, ensemble learning and re-sampling techniques to predict student's performance

Educational data-mining is an emerging area of research that could extract useful information for the students as well as for the instructors. In this research, we explore data mining techniques that predict students' final grade. We validate our method by conducting experiments on data that are related to grade for courses in North South University, the first private university and one of the leading universities in higher education in Bangladesh. We also extend our ideas through discretisation of the continuous attributes by equal width binning and incorporate it on traditional mining algorithms. However, due to imbalanced nature of data, we got lower accuracy for imbalanced classes. We implement two re-sampling techniques, i.e., ROS random over sampling, RUS random under sampling. Experimental results show that re-sampling techniques could overcome the problem of imbalanced dataset in classification significantly and improve the performance of the classification models. Moreover, three ensemble techniques, namely, bagging, boosting AdaBoost and random forests have been applied in this research to predict the students' academic performance.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  Mohamed Bekkar,et al.  Imbalanced Data Learning Approaches Review , 2013 .

[3]  Fatih Kaya,et al.  Discretizing Continuous Features for Naive Bayes and C4.5 Classifiers , 2008 .

[4]  Surjeet Kumar Yadav,et al.  Data Mining Applications: A comparative Study for Predicting Student's performance , 2012, ArXiv.

[5]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[7]  Saurabh Pal,et al.  Mining Educational Data to Analyze Students' Performance , 2012, ArXiv.

[8]  Ajay Kumar Pal Analysis and Mining of Educational Data for Predicting the Performance of Students , 2013 .

[9]  Rebecca Nugent,et al.  A Comparison of Student Skill Knowledge Estimates , 2009, EDM.

[10]  Vasile Rus,et al.  Automatic Detection of Student Mental Models During Prior Knowledge Activation in MetaTutor , 2009, EDM.

[11]  Yetian Chen,et al.  Learning Classifiers from Imbalanced, Only Positive and Unlabeled Data Sets , 2008 .

[12]  S. Taruna,et al.  A Comparative Study of Ensemble Methods for Students' Performance Modeling , 2014 .

[13]  Hany M. Harb,et al.  Adaboost ensemble with simple genetic Algorithm for student prediction model , 2013 .

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Saurabh Pal,et al.  Data Mining: A prediction for performance improvement using classification , 2012, ArXiv.

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.