A multifaceted data mining approach to understanding what factors lead college students to persist and graduate

Universities in the United States are facing the serious issue of high dropout rate and low graduation rate of four-year college students. This paper describes a host of data mining approaches to help tackle this issue. Specifically, we utilize the following approaches to identify factors that contribute to student persistence and graduation: (1) a visual analysis to identify bivariate relationships and to understand the flow of students in an educational institute; (2) an ensemble feature selection method to recognize factors that have a significant impact on a student's persistence and graduation; (3) classification and prediction algorithms to predict whether a student will persist in a given semester and ultimately graduate; and (4) a variety of association patterns to help education practitioners gain further insights into factors that affect persistence and graduation. To evaluate the above approaches, we use data originated from a local academic program. Our analyses have resulted in both interpretable and actionable outcomes. For example, the ELM (Entry Level Mathematics) score was identified as one of the most influential factors in predicting a student's third-term persistence, and furthermore graduation. This insight has in turn motivated the above program to enroll their students with low ELM scores in a remedial math course before they start their freshmen year. Among the classification algorithms under consideration in this study, we have demonstrated that Naïve Bayesian is more suitable for predicting graduation, whereas AdaBoost and SVM are better at predicting persistence.

[1]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[2]  Zlatko J. Kovacic,et al.  Early Prediction of Student Success: Mining Students Enrolment Data , 2010 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Mykola Pechenizkiy,et al.  Predicting Students Drop Out: A Case Study , 2009, EDM.

[5]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[6]  Lim Ean Heng,et al.  MINIMIZING STUDENT ATTRITION IN HIGHER LEARNING INSTITUTIONS IN MALAYSIA USING SUPPORT VECTOR MACHINE , 2015 .

[7]  R. Bhaskaran,et al.  A Study on Feature Selection Techniques in Educational Data Mining , 2009, ArXiv.

[8]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[9]  Sotiris B. Kotsiantis Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades , 2011, Artificial Intelligence Review.

[10]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[14]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[15]  Rayid Ghani,et al.  A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes , 2015, KDD.

[16]  Jock D. Mackinlay,et al.  Storytelling: The Next Step for Visualization , 2013, Computer.

[17]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[19]  Phayung Meesad,et al.  A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition , 2014, Expert Syst. Appl..

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[22]  Gérard Lassibille,et al.  Why do higher education students drop out? Evidence from Spain , 2008 .

[23]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[24]  Luc De Raedt,et al.  Machine Learning: ECML 2001 , 2001, Lecture Notes in Computer Science.