Data mining approaches to predict final grade by overcoming class imbalance problem

Data mining approaches have been used in business purposes since its inception; however, at present it is used successfully in new and emerging areas like education systems. Government of Bangladesh emphasizes the need to improve the education system. In this research, we use data mining approaches to predict students' final outcome, i.e., final grade in a particular course by overcoming the problem of imbalanced dataset. We implement several re-sampling techniques to balance the dataset so that could get better performance. Re-sampling techniques include SMOTE (Synthetic Minority Over-sampling Technique), ROS (Random over Sampling), RUS (Random under Sampling). Experimental results show that re-sampling techniques enhance the performance of the classification models that are developed to predict students' final grade in a particular course.

[1]  Gustavo E. A. P. A. Batista,et al.  Data mining with imbalanced class distributions: concepts and methods , 2009, IICAI.

[2]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[3]  Surjeet Kumar Yadav,et al.  Data Mining Applications: A comparative Study for Predicting Student's performance , 2012, ArXiv.

[4]  Ajay Kumar Pal Analysis and Mining of Educational Data for Predicting the Performance of Students , 2013 .

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Abdesselam Bouzerdoum,et al.  A supervised learning approach for imbalanced data sets , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Vasile Rus,et al.  Automatic Detection of Student Mental Models During Prior Knowledge Activation in MetaTutor , 2009, EDM.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Yetian Chen,et al.  Learning Classifiers from Imbalanced, Only Positive and Unlabeled Data Sets , 2008 .

[10]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[11]  Rebecca Nugent,et al.  A Comparison of Student Skill Knowledge Estimates , 2009, EDM.

[12]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[13]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).