Comparison Of Four Methodologies For Modeling Student Retention In Engineering

Several methodologies based on statistical methods or machine learning theories have been applied in previous studies for the modeling of student retention. However, most prior studies were based solely on a specific modeling method of authors’ choice. Direct comparison of competing methods using identical collection of student retention data was rarely provided. The purpose of this paper is to present a direct comparison of prominent methods for modeling student retention using the same data. Four modeling methodologies (neural networks, logistic regression, discriminant analysis and structural equation modeling) are included in this study. These competing methods were implemented on five retention models with various collections of cognitive and non-cognitive factors, ranging from 9 to 71 variables. The retention data in this study were collected from more than 1500 first year engineering students in a large Midwestern university. The eleven cognitive attributes include high school GPAs, standardized test scores, and the grades and number of semesters in math, science and English courses in high school. The non-cognitive variables were collected through Student Attitudinal Success Instrument (SASI), covering the following nine constructs: Leadership, Deep Learning, Surface Learning, Teamwork, Academic Self-efficacy, Motivation, Metacognition, Expectancy-value, and Major Decision. The following findings are found during this study. First, among the five retention models, the two hybrid models with both cognitive and non-cognitive factors always perform better than models consisting of either only cognitive, or only non-cognitive factors. Second, the addition of non-cognitive items can significantly improve the prediction performance of a cognitive-only model when applied properly. Third, neural network methods perform better than the other three methodologies in performance indices, followed by logistic regression. However, logistic regression may be attractive to some researchers for its ease in implementation and lower requirements for computation power. Finally, the authors found the commonly used threshold (0.05) for including variables in stepwise selection process in logistic regression may not result in the best model for prediction performance. The authors strongly suggest that researchers explore beyond this typical threshold in order to find the best performing collection of variables.

[1]  J. Burtner The Use of Discriminant Analysis to Investigate the Influence of Non‐Cognitive Factors on Engineering School Persistence , 2005 .

[2]  Barbara M. Olds,et al.  Modeling For Educational Enhancement And Assessment , 2002 .

[3]  F. W. Beaufait Engineering education needs surgery , 1991, Proceedings Frontiers in Education Twenty-First Annual Conference. Engineering Education in a New World Order.

[4]  Ernest T. Pascarella,et al.  Predicting voluntary freshman year persistence/withdrawal behavior in a residential university: A path analytic validation of Tinto's model. , 1983 .

[5]  Using the SAT and Noncognitive Variables to Predict the Grades and Retention of Asian American University Students. , 1994 .

[6]  N. Augustine Rising Above The Gathering Storm: Energizing and Employing America for a Brighter Economic Future , 2006 .

[7]  Cynthia J. Atman,et al.  Characteristics of Freshman Engineering Students: Models for Determining Student Attrition in Engineering , 1997 .

[8]  J. D. House,et al.  The relationship between academic self-concept and school withdrawal. , 1993, The Journal of social psychology.

[9]  B. French,et al.  An Examination of Indicators of Engineering Students' Success and Persistence , 2005 .

[10]  K. Reid Development of the Student Attitudinal Success Instrument: Assessment of first year engineering students including differences by gender , 2009 .

[11]  William Oakes,et al.  A structural model of engineering students success and persistence , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[12]  Norman D. Aitken College Student Performance, Satisfaction and Retention: Specification and Estimation of a Structural Model. , 1982 .

[13]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[14]  David W. Coit,et al.  STATIC NEURAL NETWORK PROCESS MODELS : CONSIDERATIONS AND CASE STUDIES , 1998 .

[15]  M. Castañeda,et al.  College Persistence: Structural Equations Modeling Test of an Integrated Model of Student Retention. , 1993 .

[16]  Amaury Nora,et al.  Testing Qualitative Indicators of Precollege Factors in Tinto’s Attrition Model: A Community College Student Population , 2017 .

[17]  Zhicheng Zhang,et al.  Prediction and Analysis of Freshman Retention , 2012 .

[18]  J. J. Lin,et al.  Artificial Intelligence Methods To Forecast Engineering Students' Retention Based On Cognitive And Non Cognitive Factors , 2008 .

[19]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[20]  P. Mcdermott,et al.  Work In Progress - An Analysis of Students’ Academic Success and Persistence Using Pre-College Factors , 2005, Proceedings Frontiers in Education 35th Annual Conference.

[21]  Lefteri H. Tsoukalas,et al.  Fuzzy and neural approaches in engineering , 1997 .

[22]  Margaret M. Nauta,et al.  Women's Career Development: Can Theoretically Derived Variables Predict Persistence in Engineering Majors?. , 1997 .