Why Predictions of At-Risk Students Are Not 100% Accurate? Showing Patterns in False Positive and False Negative Predictions

Predictive modelling with the focus on identification of students at risk of failing has become one of the most prevalent topics in the Learning Analytics and Educational Data Mining. Most of the published work is focused on training the machine learning model that achieves the highest prediction performance, as measured by several metrics. Nevertheless, limited work focuses on the behaviour of the model and in particular, analysis of the errors the models make during predictions. This poster presents preliminary results that fill this gap by providing a methodology for finding the patterns of errors both for False Positives and False Negatives. We show results from the task of predicting students at risk of not submitting their first assignments on 48 first-year STEM courses, separately for False Positives and False Negatives. The erroneous predictions that are not possible to be explained will inform subsequent qualitative analysis i.e., interviews with students.