Predicting the risk of attrition for undergraduate students with time based modelling

Improving student retention is an important and challenging problem for universities. This paper reports on the development of a student attrition model for predicting which first year students are most at-risk of leaving at various points in time during their first semester of study. The objective of developing such a model is to assist universities by proactively supporting and retaining these students as their situations and risk change over time. The study evaluated different models for predicting student attrition at four different time periods throughout a semester study period: pre-enrolment, enrolment, in-semester and end-of-semester models. A dataset of 23,291 students who enrolled in their first semester between 2011-2013 was extracted from various data sources. Three supervised machine learning techniques were tested to develop the predictive models: logistic regression, decision trees and random forests. The performance of these models were evaluated using the precision and recall metrics. The model achieved the best performance and user utility using logistic regression (67% precision, 29% recall). A web application was developed for users to visualise and interact with the model results to assist in the targeting of student intervention responses and programs.

[1]  Janet E. Hurn,et al.  Using learning analytics to predict (and improve) student success: a faculty perspective , 2013 .

[2]  R. Reason Student Variables that Predict Retention: Recent Research and New Developments , 2003 .

[3]  John M. Braxton Reworking the Student Departure Puzzle , 2020 .

[4]  Matt Bogard,et al.  A Comparison of Empirical Models for Predicting Student Retention , 2011 .

[5]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[6]  Sharon Xuereb,et al.  Why students consider terminating their studies and what convinces them to stay , 2014 .

[7]  Robert P. W. Duin,et al.  Feature Scaling in Support Vector Data Descriptions , 2000 .

[8]  Samuel DiGangi,et al.  A Data Mining Approach for Identifying Predictors of Student Retention from Sophomore to Junior Year , 2021, Journal of Data Science.

[9]  Cheryl Lynn Moller-Wong,et al.  An Engineering Student Retention Study , 1997 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[12]  Allan Gutjahr,et al.  Predicting Student Retention and Academic Success at New Mexico Tech , 2000 .

[13]  P. Murtaugh,et al.  PREDICTING THE RETENTION OF UNIVERSITY STUDENTS , 1999 .

[14]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[15]  Aboma Olani,et al.  Predicting First Year University Students' Academic Success , 2017 .

[16]  M. Glogowska,et al.  Should I go or should I stay? , 2007 .

[17]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  Yi-Chi Chen,et al.  Determinants and probability prediction of college student retention: new evidence from the Probit model , 2012 .

[20]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Z. Kovacic,et al.  Predicting student success by mining enrolment data. , 2012 .

[23]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[24]  D. Fike,et al.  Predictors of First-Year Student Retention in the Community College , 2008 .

[25]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[26]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[27]  Eric L. Dey,et al.  Statistical alternatives for studying college student retention: A comparative analysis of logit, probit, and linear regression , 1993 .

[28]  Dirk Ifenthaler,et al.  Development and Validation of a Learning Analytics Framework: Two Case Studies Using Support Vector Machines , 2014, Technology, Knowledge and Learning.

[29]  Mykola Pechenizkiy,et al.  Predicting Students Drop Out: A Case Study , 2009, EDM.

[30]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[31]  Kirsten McKenzie,et al.  Who Succeeds at University? Factors predicting academic performance in first year Australian university students , 2001 .