A Data Mining Approach to Understanding Curriculum-Level Factors That Help Students Persist and Graduate

This Research Full Paper describes the analysis of curriculum-level factors that affected the persistence and graduation outcomes of over 4,000 undergraduate students at San Francisco State University. This work addressed four questions: (1) how did the timing of students’ Mathematics courses affect their performance and outcome; (2) whether students who progressed farther through the prescribed foundation course sequences of the university’s long-duration learning community program exhibited higher persistence and graduation rates; (3) what were the most frequently-taken sequences of courses, and whether students who progressed farther through those sequences exhibited higher graduation rates; and (4) whether greater progress was more important than other demographic and academic factors for predicting persistence and graduation. We found that students who took their first non-remedial Math course in the second year showed higher fifth-term and seventh-term persistence than students who took it in the first year. Also, students who progressed farther through their chosen or prescribed sequences consistently exhibited higher persistence and graduation rates. Furthermore, a student’s persistence was a more reliable predictor of graduation than other features. Overall, these findings can potentially inform an institution’s strategies for maximizing persistence and graduation by emphasizing a student’s progress through the curriculum.

[1]  Amjad Abu Saa,et al.  Educational Data Mining & Students’ Performance Prediction , 2016 .

[2]  S. Bhaskaran,et al.  A data mining approach for investigating students’ completion rates , 2015 .

[3]  Judy Kay,et al.  Clustering and Sequential Pattern Mining of Online Collaborative Learning Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  George D. Kuh,et al.  Adding Value: Learning Communities and Student Engagement , 2004 .

[5]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Renzo Sprugnoli,et al.  Data mining models for student careers , 2015, Expert Syst. Appl..

[7]  Liang Zheng,et al.  Principles for Assessing Adaptive Online Courses , 2018, EDM.

[8]  Gilles Louppe,et al.  Understanding Random Forests , 2015 .

[9]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[10]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[11]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[12]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Wahidah Husain,et al.  A Review on Predicting Student's Performance Using Data Mining Techniques , 2015 .

[14]  Syed Abbas Ali,et al.  Analyzing undergraduate students' performance using educational data mining , 2017, Comput. Educ..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Celia Graterol,et al.  A multifaceted data mining approach to understanding what factors lead college students to persist and graduate , 2017, 2017 Computing Conference.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[19]  Judy Kay,et al.  Analysing Frequent Sequential Patterns of Collaborative Learning Activity Around an Interactive Tabletop. Nominee for Best Paper Award , 2010, EDM.

[20]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[22]  Vincent Tinto,et al.  Classrooms as Communities: Exploring the Educational Character of Student Persistence. , 1997 .

[23]  Martin Drlík,et al.  Quantitative and Qualitative Evaluation of Sequence Patterns Found by Application of Different Educational Data Preprocessing Techniques , 2017, IEEE Access.

[24]  Chunyan Miao,et al.  A Novel Cascade Model for Learning Latent Similarity from Heterogeneous Sequential Data of MOOC , 2017, EMNLP.

[25]  M. Karp,et al.  Improving Student Outcomes via Comprehensive Supports , 2013 .

[26]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Mohamad Irfan,et al.  Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text , 2017 .

[28]  Zhihai Rong,et al.  Orderliness predicts academic performance: behavioural analysis on campus lifestyle , 2017, Journal of The Royal Society Interface.