Investigation of College Dropout with the Fuzzy C-Means Algorithm

Up to 50% of the students drop out of school in Brazilian universities. Because of the heterogeneity of individuals, it is difficult to determine which are the main causes of this high percentage of students not finishing their degree. In this paper, we employed the Fuzzy C-Means algorithm on a dataset composed of real-world registers of the Biology Undergraduate course from Brazilian universities. We applied the transactional distance theory to select the set of variables which were utilized in the clustering process. The results indicate that the data is better divided into five groups. We observed that the Fuzzy C-Means generated groups based on how engaged the students are, and, in each group, there are two subgroups: students that drop out and do not drop out the course. The type of analysis presented in this work can generate inputs for the institutions to establish new policies to reduce the dropout rate.