A Systematic Mapping on the Use of Data Mining for the Face-to-Face School Dropout Problem

Dropout is a critical problem that affects institutions worldwide. Data mining is an analytical solution that has been used to deal with it. Typically, data mining follows a structured process containing the following general steps: data collection, pre-processing, pattern extraction, post-processing (validation). Until know, it is not known how data mining has been used to address the dropout problem in face-to-face education considering all steps of the process. For that, a Systematic Literature Mapping was conducted to identify and analyze the primary studies available in the literature to address some research questions. The aim was to provide an overview of the aspects related to data mining steps in the presented context, without going into details about specific techniques, but about the solutions themselves (for example, imbalanced techniques, instead of SMOTE). 118 papers were selected considering a period of 10 years (01/01/2010 to 31/12/2020).

[1]  Sotiris B. Kotsiantis,et al.  Data preprocessing in predictive data mining , 2019, The Knowledge Engineering Review.

[2]  Lucy C. Sorensen “Big Data” in Educational Administration: An Application for Predicting School Dropout Risk , 2018, Educational Administration Quarterly.

[3]  Habib Fardoun,et al.  Early dropout prediction using data mining: a case study with high school students , 2016, Expert Syst. J. Knowl. Eng..

[4]  Tio Dharmawan,et al.  Dropout Detection Using Non-Academic Data , 2018, 2018 4th International Conference on Science and Technology (ICST).

[5]  Dursun Delen,et al.  Predicting Student Attrition with Data Mining Methods , 2011 .

[6]  D. Vitale,et al.  A Statistical Analysis of Factors Affecting Higher Education Dropouts , 2019, Social Indicators Research.

[7]  Sebastián Ventura,et al.  LAC: Library for associative classification , 2020, Knowl. Based Syst..

[8]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[9]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[10]  Sebastián Ventura,et al.  A Survey on Pre-Processing Educational Data , 2014 .

[11]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[12]  Aditya Johri,et al.  Running out of STEM: a comparative study across STEM majors of college students at-risk of dropping out early , 2018, LAK.

[13]  Marco F. Huber,et al.  A Survey on the Explainability of Supervised Machine Learning , 2020, J. Artif. Intell. Res..

[14]  David Gibson,et al.  Predicting the risk of attrition for undergraduate students with time based modelling , 2015, CELDA 2015.

[15]  Camilo Castellanos,et al.  Applying Data Mining Techniques to Predict Student Dropout: A Case Study , 2018, 2018 IEEE 1st Colombian Conference on Applications in Computational Intelligence (ColCACI).

[16]  R Luis Fernando Castro,et al.  Applying CRISP-DM in a KDD Process for the Analysis of Student Attrition , 2018 .

[17]  S. Schwartz,et al.  Leaving College: Rethinking the Causes and Cures of Student Attrition , 1987 .

[18]  Dursun Delen,et al.  A comparative analysis of machine learning techniques for student retention management , 2010, Decis. Support Syst..

[19]  Utomo Pujianto,et al.  Classification of province based on dropout rate using C4.5 algorithm , 2017, 2017 International Conference on Sustainable Information Engineering and Technology (SIET).

[21]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[22]  Fabio A. González,et al.  A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining , 2015, IEEE Revista Iberoamericana de Tecnologias del Aprendizaje.

[23]  Marlon Dumas,et al.  Adaptations of data mining methodologies: a systematic literature review , 2020, PeerJ Comput. Sci..

[24]  Sérgio Manuel Serra da Cruz,et al.  WAVE: an architecture for predicting dropout in undergraduate courses using EDM , 2014, SAC.

[25]  Faiza Tahir,et al.  Predictive Analysis for Student Retention by Using Neuro-Fuzzy Algorithm , 2018, 2018 10th Computer Science and Electronic Engineering (CEEC).

[26]  Camilo Castellanos,et al.  Predicting Student Drop-Out Rates Using Data Mining Techniques: A Case Study , 2018 .

[27]  Jake VanderPlas,et al.  A Practical Taxonomy of Reproducibility for Machine Learning Research , 2018 .

[28]  Jae Young Chung,et al.  Dropout early warning systems for high school students using machine learning , 2019, Children and Youth Services Review.

[29]  Ricardo Timaran Pereira,et al.  Application of Decision Trees for Detection of Student Dropout Profiles , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[30]  Mauro Mezzini,et al.  University Dropout Prediction through Educational Data Mining Techniques: A Systematic Review , 2019 .