Clarify of the Random Forest Algorithm in an Educational Field

Many supportive decision systems using classification algorithms have been built as a black box in the last years. Such systems were hiding its inner operations to users. Lack of explanation of these algorithms leads to a practical problem. The education field is one of the areas that needs more clarification in such systems to help users in order to get more information for a right decision. In this paper, the Random Forest algorithm has been clarified and used in analyzing the students’ performance, as a dataset. The result showed that the clarified method of the aforementioned algorithm can give an accuracy of 83.56%. On the other hand, WEKA tool gives an accuracy of 80.82% for the same algorithm and dataset. Also, the proposed method of the Random Forest algorithm has been tested using another previous study’s dataset. The comparison results showed that the proposed method can give an accuracy of 92.65%, which is in turn better than the accuracy of 91.2% that obtained by another study done. Furthermore, to make the Random Forest algorithm work as a white box, Rules have been extracted from the Random Forest black box algorithm in order to make it more interpretable and helpful in predicting student’s performance.

[1]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[2]  Senlin Luo,et al.  Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes , 2015, IEEE Journal of Biomedical and Health Informatics.

[3]  Uday V. Kulkarni,et al.  Extracting classification rules from modified fuzzy min-max neural network for data with mixed attributes , 2016, Appl. Soft Comput..

[4]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[5]  Bart Baesens,et al.  Comprehensible software fault and effort prediction: A data mining approach , 2015, J. Syst. Softw..

[6]  Dharminder Kumar,et al.  Mining Students' Data for Prediction Performance , 2014, 2014 Fourth International Conference on Advanced Computing & Communication Technologies.

[7]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[8]  N Mangathayaru,et al.  Evolution and prediction of radical multi-dimensional e-learning system with cluster based data mining techniques , 2017, 2017 International Conference on Trends in Electronics and Informatics (ICEI).

[9]  Sadiq Hussain,et al.  Educational Data Mining and Analysis of Students’ Academic Performance Using WEKA , 2018 .

[10]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[11]  Tahira Mahboob,et al.  A machine learning approach for student assessment in E-learning using Quinlan's C4.5, Naive Bayes and Random Forest algorithms , 2016, 2016 19th International Multi-Topic Conference (INMIC).

[12]  Mustafa Agaoglu,et al.  Predicting Instructor Performance Using Data Mining Techniques in Higher Education , 2016, IEEE Access.

[13]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[14]  Ali Yazdian Varjani,et al.  New rule-based phishing detection method , 2016, Expert Syst. Appl..

[15]  Priyanka Sharma,et al.  Performance prediction of students using distributed data mining , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).