Comparing Data Mining Models in Academic Analytics

The goal of this research study was to compare data mining techniques in predicting student graduation. The data included demographics, high school, ACT profile, and college indicators from 1995-2005 for first-time, full-time freshman students with a six year graduation timeline for a flagship university in the south east United States. The results indicated no difference in misclassification rates between logistic regression, decision tree, neural network, and random forest models. The results from the study suggest that institutional researchers should build and compare different data mining models and choose the best one based on its advantages. The results can be used to predict students at risk and help these students graduate.

[1]  Nadine Meskens,et al.  Determination of factors influencing the achievement of the first-year university students using data mining methods , 2006 .

[2]  Y.S. Abu-Mostafa Introduction to the Theory of Neural Computation {Book Reviews] , 1996, IEEE Transactions on Information Theory.

[3]  Mohd Syazwan Abdullah,et al.  Ontology-Based Applications for Enterprise Systems and Knowledge Management , 2012 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  P. K. Imbrie,et al.  Student Retention Modelling: An Evaluation of Different Methods and their Impact on Prediction Results , 2009 .

[6]  Daniel T. Larose,et al.  Data mining methods and models , 2006 .

[7]  Vincent Tinto Limits of Theory and Practice in Student Attrition , 1982 .

[8]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[9]  Gloria Crisp,et al.  Student persistence and degree attainment beyond the first year in college: The need for research , 2005 .

[10]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[11]  Serge Herzog,et al.  Estimating Student Retention and Degree-Completion Time: Decision Trees and Neural Networks Vis-a-Vis Regression. , 2006 .

[12]  Adam Fadlalla,et al.  An experimental investigation of the impact of aggregation on the performance of data mining with logistic regression , 2005, Inf. Manag..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[16]  R. M. Chandrasekaran,et al.  A Hybrid Multilayer Perceptron Neural Network for Direct Marketing , 2012, Int. J. Knowl. Based Organ..

[17]  Payal Pahwa,et al.  An Efficient Algorithm for Data Cleaning , 2011, Int. J. Knowl. Based Organ..

[18]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[19]  Rodrigo Valio Dominguez Gonzalez Knowledge Management Process in Multi-Site Provision of Service , 2016, Int. J. Knowl. Manag..

[20]  B. McCall,et al.  Simulating the Longitudinal Effects of Changes in Financial Aid on Student Departure from College , 2002 .

[21]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[22]  Elliot Maltz,et al.  Expanding the role of institutional research at small private universities: A case study in enrollment management using data mining , 2006 .

[23]  Safaai Deris,et al.  An approach for biological data integration and knowledge retrieval based on ontology, semantic web services composition, and ai planning , 2013 .

[24]  K. S. Sarma,et al.  Predictive Modeling With SAS Enterprise Miner: Practical Solutions for Business Applications , 2007 .

[25]  Alan Seidman,et al.  College Student Retention: Formula for Student Success. , 2005 .

[26]  K. Hornik,et al.  A Laboratory for Recursive Partytioning , 2015 .

[27]  Randall Matignon Neural Network Modeling using SAS Enterprise Miner , 2005 .

[28]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[29]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[30]  Dheeraj Raju,et al.  Exploring Student Characteristics of Retention that Lead to Graduation in Higher Education Using Data Mining Models , 2015 .