Predicting and preventing student failure - using the k-nearest neighbour method to predict student performance in an online course environment

We study the problem of predicting student performance in an online course. Our specific goal is to identify at an early stage of the course those students who have a high risk of failing. We employ the k-nearest neighbour method (KNN) and its many variants on this problem. We present extensive experimental results from a 12-lesson course on touch-typing, with a database of close to 15,000 students. The results indicate that KNN can predict student performance accurately, and already after the very first lessons. We conclude that early tests on skills can also be strong predictors for final scores also in other skill-based courses. Selected methods described in this paper will be implemented as an early warning feature for teachers of the touch-typing course, so they can quickly focus their attention to the students who need help the most.

[1]  Geoffrey H. Moore,et al.  The National Bureau of Economic Research , 1950 .

[2]  Stanley R. Trollip,et al.  Computer-Based Instruction: Methods and Development , 1985 .

[3]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[4]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[5]  Walter Daelemans,et al.  Generalization performance of backpropagation learning on a syllabification task , 1992 .

[6]  W. F. Punch,et al.  Predicting student performance: an application of data mining methods with an educational Web-based system , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[7]  Yiming Yang,et al.  A Loss Function Analysis for Classification Methods in Text Categorization , 2003, ICML.

[8]  Wan-I Lee,et al.  The application of nearest neighbor algorithm on creating an adaptive on-line learning system , 2001, 31st Annual Frontiers in Education Conference. Impact on Engineering and Science Education. Conference Proceedings (Cat. No.01CH37193).

[9]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[10]  Peter Brusilovsky,et al.  Adaptive and Intelligent Technologies for Web-based Eduction , 1999, Künstliche Intell..

[11]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Sotiris B. Kotsiantis,et al.  Preventing Student Dropout in Distance Learning Using Machine Learning Techniques , 2003, KES.

[13]  Majid Ahmadi,et al.  Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[14]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Naohiro Ishii,et al.  Nearest Neighbor Classification by Relearning , 2009, IDEAL.

[16]  Wai Lam,et al.  Exploring Query Matrix for Support Pattern Based Classification Learning , 2005, ICMLC.

[17]  Ying Zou,et al.  Evaluation and automatic selection of methods for handling missing data , 2005, 2005 IEEE International Conference on Granular Computing.

[18]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.