Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

In this paper, a model for predicting students' performance levels is proposed which employs three machine learning algorithms: instance-based learning Classifier, Decision Tree and Naive Bayes. In addition, three decision schemes were used to combine results of the machine learning techniques in different ways to investigate if better classification performance could be achieved. The experiment consists of two phases that are testing and training. These phases are conducted at three steps which correspond to different stages in the semester. At each step the number of attributes in the dataset has been increased and all attributes were included at final stage. The important characteristic of the dataset was that it only contains time-varying attributes rather than time-invariant attributes such as gender or age. This type of dataset has helped to learn to what extend time-invariant data has significant effect on prediction accuracy. The experiment results were evaluated in terms of overall accuracy, sensitivity and precision. Results are discussed compared to results reported in the relevant literature.