Modern predictive models for modeling the college graduation rates

Modern predictive modeling techniques are commonly used for modeling a target of interest based on a list of input variables. In general, these techniques are capable of identifying input variables associated with the target, but not for the purpose of identifying the causation relationship between target and inputs due to the fact that the data are observational data. Advanced technology has made data collection very easy and fast. As a result, when applying predictive modeling methods, the issue of data cleansing becomes critical. This article aims at comparing ten modern predictive modeling techniques for predicting college graduation rate within 6 years. The input variables include variables on ‘pre-college’ performance, ‘first-year’ college performance and various social-economic variables, as well as some variables related to university learning environment. The issue of data quality and modeling technique selection are discussed. Some pitfalls and cautions of applying predictive modeling techniques are discussed.

[1]  Clark Glymour,et al.  Application of the TETRAD II Program to the Study of Student Retention in U.S. Colleges , 1994, KDD Workshop.

[2]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  Yuji Iwahori,et al.  Reduction of Defect Misclassification of Electronic Board Using Multiple SVM Classifiers , 2014, Int. J. Softw. Innov..

[5]  Vincent Tinto Dropout from Higher Education: A Theoretical Synthesis of Recent Research , 1975 .

[6]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[7]  Dheeraj Raju,et al.  Exploring Student Characteristics of Retention that Lead to Graduation in Higher Education Using Data Mining Models , 2015 .

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  Dmitri Rogulkin Predicting 6-Year Graduation and High-Achieving and At-Risk Students. , 2011 .

[10]  Min Zhan,et al.  The Role of First-Semester GPA in Predicting Graduation Rates of Underrepresented Students , 2016 .

[11]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Mark A. Brennan,et al.  Ensuring Data Quality in Extension Research and Evaluation Studies. , 2012 .

[14]  Edward C. Warburton,et al.  Bridging the Gap: Academic Preparation and Postsecondary Success of First-Generation Students. Statistical Analysis Report. Postsecondary Education Descriptive Analysis Reports. , 2001 .

[15]  M. Narasimha Murty,et al.  Data Mining Techniques , 2014 .

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[17]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[18]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Liza Reisel,et al.  Competing Explanations of Undergraduate Noncompletion , 2011 .

[21]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[22]  Osamu Mizuno,et al.  Fault-Prone Module Prediction Approaches Using Identifiers in Source Code , 2015, Int. J. Softw. Innov..