Comparative Analysis of Prediction Techniques to Determine Student Dropout: Logistic Regression vs Decision Trees

Currently, the detection of students who may drop out from an academic program is a relevant issue for universities, so there are efforts to examine the variables that determine students' drop out. Drop out is defined in different ways, however, all the studies converge in that for a student to drop out a course of study, some variables must be combined. This study presents a comparison of performance indicators of the current drop out model of the Universidad del Bío-Bío (UBB), which is based on logistic regression technique and it is compared with a new model based on decision trees. The new model is obtained through data mining methodologies and it was implemented through the SAP Predictive Analytics tool. To train, validate, and apply the model, real data from the UBB databases were used. The comparison shows that the prediction of student´ drop out of the proposed model obtains an accuracy of 86%, a precision of 97% with an error rate of 14%, better indicators than the current values delivered by the model based on logistic regression. Subsequently, the prediction model obtained was optimized considering other variables, improving even more the prediction indicators. Higher education institutions should take into account the variables that explain the most the phenomenon of student´s drop out to improve the retention of their students.

[1]  W. V. Bingham,et al.  Expectancies , 1953 .

[2]  Salvador Rayo,et al.  A Credit Scoring Model for Institutions of Microfinance under the Basel II Normative (Un Modelo De Credit Scoring Para Instituciones De Microfinanzas En El Marco De Basilea II) , 2010, Cuadernos de difusión.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  John P. Bean Dropouts and turnover: The synthesis and test of a causal model of student attrition , 1980 .

[5]  John P. Bean The Application of a Model of Turnover in Work Organizations to the Student Attrition Process , 2017 .

[6]  William G. Spady,et al.  Dropouts from higher education: An interdisciplinary review and synthesis , 1970 .

[7]  C. Peralta MODELO CONCEPTUAL PARA LA DESERCION ESTUDIANTIL UNIVERSITARIA CHILENA conceptual model for dropout chilean university student , 2008 .

[8]  E. Durkheim Suicide: A Study in Sociology , 1897 .

[9]  Ying LU,et al.  Decision tree methods: applications for classification and prediction , 2015, Shanghai archives of psychiatry.

[10]  Elizabeth E. Grandón,et al.  Antecedentes del éxito de los sistemas de planificación de recursos empresariales en las grandes empresas chilenas: Un modelo factorial exploratorio. , 2017 .

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[13]  Ruth Zárate Rueda,et al.  La deserción estudiantil UIS, una mirada desde la responsabilidad social universitaria , 2014 .

[14]  Sergio Baeza R For Beginners , 2015 .

[15]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[16]  John P. Bean Interaction Effects Based on Class Level in an Explanatory Model of College Student Dropout Syndrome , 1985 .

[17]  R. Rosenfeld Belief , 2012, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[18]  Mauricio A Miranda,et al.  Análisis de la Deserción de Estudiantes Universitarios usando Técnicas de Minería de Datos , 2017 .

[19]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[20]  C. Ethington A psychological model of student persistence , 1990 .

[21]  Vincent Tinto Dropout from Higher Education: A Theoretical Synthesis of Recent Research , 1975 .