Predicting School Failure and Dropout by Using Data Mining Techniques

This paper proposes to apply data mining techniques to predict school failure and dropout. We use real data on 670 middle-school students from Zacatecas, México, and employ white-box classification methods, such as induction rules and decision trees. Experiments attempt to improve their accuracy for predicting which students might fail or dropout by first, using all the available attributes; next, selecting the best attributes; and finally, rebalancing data and using cost sensitive classification. The outcomes have been compared and the models with the best results are shown.

[1]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[2]  Rosana Satorre Cuerda,et al.  Rendimiento académico de los estudios de Informática en algunos centros españoles , 2009 .

[3]  Sotiris B. Kotsiantis Educational data mining: a case study for predicting dropout-prone students , 2009, Int. J. Knowl. Eng. Soft Data Paradigms.

[4]  N. V. Kalyankar,et al.  Drop Out Feature of Student Data for Academic Performance Using Decision Tree Techniques , 2010 .

[5]  L. Fortin,et al.  Typology of students at risk of dropping out of school: Description by personal, family and school factors , 2006 .

[6]  Ernesto Espíndola,et al.  La deserción escolar en América Latina: un tema prioritario para la agenda regional , 2002 .

[7]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Alberto Salguero,et al.  Factors influencing university drop out rates , 2009, Comput. Educ..

[10]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Sotiris B. Kotsiantis,et al.  A combinational incremental ensemble of classifiers as a technique for predicting students' performance in distance education , 2010, Knowl. Based Syst..

[13]  Tomàs Aluja Banet La minería de datos, entre la estadística y la inteligencia artificial , 2001 .

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Baudilio Martínez Muñiz Causas del fracaso escolar y técnicas para afrontarlo , 1980 .

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Geoffrey Holmes,et al.  Benchmarking attribute selection techniques for data mining , 2000 .

[18]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[20]  A. Parker A Study of Variables that Predict Dropout from Distance Education. , 1999 .

[21]  Laurence G Moseley,et al.  Predicting who will drop out of nursing courses: a machine learning exercise. , 2008, Nurse education today.