Decision trees for predicting the academic success of students

The aim of this paper is to create a model that successfully classifies students into one of two categories, depending on their success at the end of their first academic year, and finding meaningful variables affecting their success. This model is based on information regarding student success in high school and their courses after completing their first year of study, as well as the rank of preferences assigned to the observed faculty, and attempts to classify students into one of the two categories in line with their academic success. Creating a model required collecting data on all undergraduate students enrolled into their second year at the Faculty of Economics, University of Osijek, as well as data on completion of the state exam. These two datasets were combined and used for the model. Several classification algorithms for constructing decision trees were compared and the statistical significance (t-test) of the results was analyzed. Finally, the algorithm that produced the highest accuracy was chosen as the most successful algorithm for modeling the academic success of students. The highest classification rate of 79% was produced using the REPTree decision tree algorithm, but the tree was not as successful in classifying both classes. Therefore, the average rate of classification was calculated for two models that gave the highest total rate of classification, where a higher percentage is achieved using the model relying on the algorithm J48. The most significant variables were total points in the state exam, points from high school and points in the Croatian language exam.

[1]  Z. Kovacic,et al.  Predicting student success by mining enrolment data. , 2012 .

[2]  Marijana Zekić-Sušac,et al.  Neuron Networks and Trees of Decision-making for Prediction of Eficiency in Studies , 2009 .

[3]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Nadine Meskens,et al.  Predicting Academic Performance by Data Mining Methods , 2007 .

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Mohammed Erritali,et al.  A comparative study of decision tree ID3 and C4.5 , 2014 .

[7]  Nguyen Thai Nghe,et al.  A comparative analysis of techniques for predicting academic performance , 2007, 2007 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports.

[8]  Surjeet Kumar Yadav,et al.  Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification , 2012, ArXiv.

[9]  Panagiotis Zervas,et al.  Benchmarking Feature Selection Techniques on the Speaker Verification Task , 2006 .

[10]  Vlado Simeunović,et al.  Using Data Mining to Predict Success in Studying / Primjena rudarenja podataka u predviđanju uspješnosti studiranja , 2014, Croatian Journal of Education - Hrvatski časopis za odgoj i obrazovanje.

[11]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[12]  Edin Osmanbegović,et al.  DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE , 2012 .

[13]  I. Sprinkhuizen-Kuyper,et al.  Data Mining Algorithms for Classification , 2008 .

[14]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Sushilkumar Kalmegh,et al.  Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News , 2015 .

[17]  Željko Garača,et al.  Student Dropout Analysis with Application of Data Mining Methods , 2010 .

[18]  Dursun Delen,et al.  A comparative analysis of machine learning techniques for student retention management , 2010, Decis. Support Syst..

[19]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[21]  Serge Herzog,et al.  Estimating Student Retention and Degree-Completion Time: Decision Trees and Neural Networks Vis-a-Vis Regression. , 2006 .