Detection of Potentially Students Drop Out of College in Case of Missing Value Using C4.5

The reputation of a university can be determined by the number of students drop out. This problem was experienced by many universities in Indonesia. It has been conducted by many researchers, however the data acquisition, attributes were not well explained. This study is aiming for giving projection related to the reasons behind students drop out by using machine learning technique. The challenging phase of preprocessing primary datasets are missing value, balanced class distribution, and a variety of data types. Two classes are applied: drop out and graduate students. By analyzing the problem of missing value data, it can reflect the basis of why students drop out or students who have the potential to drop out. According to the problem of balanced class distribution, Decision Tree algorithm is utilized, meanwhile for tackling the various of data types, we use C4.5. The result shows that 20 attributes using stratified sampling is the best of among all datasets and experimentations with an average AUC, accuracy, precision, and recall values of 0.98, 96.87, 98.75, and 97.84 respectively. It indicates that the proposed method is suitable for predicting students drop out with a balanced case of class distribution, despite having a missing data value problem.

[1]  Windania Purba,et al.  The effect of mining data k-means clustering toward students profile model drop out potential , 2018 .

[2]  S. T. M.T Fatah Yasin Al Irsyadi,et al.  Analisa Perbandingan Hasil Pohon Keputusan Dengan Gain Ratio, Information Gain, Dan Gini Index Pada Pemasaran Produk Herbal di CV. Al-Ghuroba , 2016 .

[3]  Seoyong Kim,et al.  Sustainable Education: Analyzing the Determinants of University Student Dropout by Nonlinear Panel Data Models , 2018 .

[4]  Fernandes Simangunsong,et al.  UNDANG-UNDANG DASAR NEGARA REPUBLIK INDONESIA TAHUN 1945 , 2016 .

[5]  Chastine Fatichah,et al.  Solution of class imbalance of k-nearest neighbor for data of new student admission selection , 2019 .

[6]  Hui Li,et al.  Forecasting business failure: The use of nearest-neighbour support vectors and correcting imbalanced samples – Evidence from the Chinese hotel industry , 2012 .

[7]  Mohammed Erritali,et al.  A comparative study of decision tree ID3 and C4.5 , 2014 .

[8]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[9]  Sabrina Ahmad,et al.  A Comparison Framework of Classification Models for Software Defect Prediction , 2014 .

[10]  Md. Rabiul Hossain,et al.  Factors Influencing on Dropouts at Undergraduate Level in Private Universities of Bangladesh: A Case Study , 2018 .

[11]  José Carlos Núñez,et al.  Factors that determine the persistence and dropout of university students , 2018, Psicothema.

[12]  M. R. Larsen,et al.  Evidence on Dropout Phenomena at Universities , 2013 .

[13]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[14]  Anik Andriani,et al.  Penerapan Algoritma C4.5 Pada Program Klasifikasi Mahasiswa Dropout , 2012 .

[15]  Jajam Haerul Jaman PREDIKSI KELULUSAN MAHASISWA DENGAN METODE ALGORITMA C4.5 , 2016 .

[16]  Dragan Vuksanović,et al.  INDUSTRY 4.0: THE FUTURE CONCEPTS AND NEW VISIONS OF FACTORY OF THE FUTURE DEVELOPMENT , 2016 .

[17]  Alex Alves Freitas,et al.  Automatic Design of Decision-Tree Induction Algorithms , 2015, SpringerBriefs in Computer Science.

[18]  Seng Hansun,et al.  Implementasi data mining dengan algoritma C4.5 untuk memprediksi tingkat kelulusan mahasiswa , 2014 .

[19]  Roberto Santana,et al.  An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers , 2017, Expert Syst. Appl..

[20]  K. P. Soman,et al.  Robust Intelligent Malware Detection Using Deep Learning , 2019, IEEE Access.