Contrasting prediction methods for early warning systems at undergraduate level

Abstract Recent studies have provided evidence in favour of adopting early warning systems as a means of identifying at-risk students. Our study examines eight prediction methods, and investigates the optimal time in a course to apply such a system. We present findings from a statistics university course which has weekly continuous assessment and a large proportion of resources on the Learning Management System Blackboard. We identify weeks 5–6 (half way through the semester) as an optimal time to implement an early warning system, as it allows time for the students to make changes to their study patterns while retaining reasonable prediction accuracy. Using detailed variables, clustering and our final prediction method of BART (Bayesian Additive Regressive Trees) we can predict students' final mark by week 6 based on mean absolute error to 6.5 percentage points. We provide our R code for implementation of the prediction methods used in a GitHub repository 1 . Abbreviations: Bayesian Additive Regressive Trees (BART); Random Forests (RF); Principal Components Regression (PCR); Multivariate Adaptive Regression Splines (Splines); K-Nearest Neighbours (KNN); Neural Networks (NN) and; Support Vector Machine (SVM)

[1]  Rebecca Ferguson,et al.  Learning analytics: drivers, developments and challenges , 2012 .

[2]  Zdenek Zdráhal,et al.  Developing predictive models for early detection of at-risk students on distance learning modules , 2014, LAK Workshops.

[3]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[4]  Marek Hatala,et al.  Learning at distance: Effects of interaction traces on academic achievement , 2015, Comput. Educ..

[5]  Shaobo Huang,et al.  Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models , 2013, Comput. Educ..

[6]  David Azcona,et al.  Micro-analytics for Student Performance Prediction Leveraging fine-grained learning analytics to predict performance , 2015 .

[7]  Shane Dawson,et al.  Mining LMS data to develop an "early warning system" for educators: A proof of concept , 2010, Comput. Educ..

[8]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[9]  Jonathan Cole,et al.  Using continuous assessment to promote student engagement in a large class , 2012 .

[10]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Adam Kapelner,et al.  bartMachine: Machine Learning with Bayesian Additive Regression Trees , 2013, 1312.2171.

[13]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[14]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[15]  Paula Carroll,et al.  Identifying Patterns of Learner Behaviour: What Business Statistics Students Do with Learning Resources , 2017, INFORMS Trans. Educ..

[16]  Steven Lonn,et al.  Improving Early Warning Systems with Categorized Course Resource Usage , 2016 .

[17]  Alan F. Smeaton,et al.  Using Educational Analytics to Improve Test Performance , 2015, EC-TEL.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Valerii Fedorov,et al.  Consequences of dichotomization , 2009, Pharmaceutical statistics.

[20]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[21]  Eva Lucrecia Gibaja Galindo,et al.  Predicting students' marks from Moodle logs using neural network models , 2006 .

[22]  N. Holmes,et al.  Student perceptions of their learning and engagement in response to the use of a continuous e-assessment in an undergraduate module , 2015 .

[23]  Farshid Marbouti,et al.  Models for early prediction of at-risk students in a course using standards-based grading , 2016, Comput. Educ..

[24]  V. Shute Focus on Formative Feedback , 2008 .

[25]  Geraldine Clarebout,et al.  Tool-use in a blended undergraduate course: In Search of user profiles , 2011, Comput. Educ..

[26]  Dragan Gasevic,et al.  Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success , 2016, Internet High. Educ..

[27]  Matthew D. Pistilli,et al.  Course signals at Purdue: using learning analytics to increase student success , 2012, LAK.

[28]  Ji Won You,et al.  Identifying significant indicators using LMS data to predict course achievement in online learning , 2016, Internet High. Educ..

[29]  Zdenek Zdráhal,et al.  Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment , 2013, LAK '13.

[30]  Matthew D. Pistilli,et al.  Purdue Signals , 2010 .

[31]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[32]  A. Parnell,et al.  Live lectures or online videos: students’ resource choices in a first-year university mathematics module , 2017 .