Football Match Prediction with Tree Based Model Classification

This paper presents the football match prediction using a tree-based model algorithm (C5.0, Random Forest, and Extreme Gradient Boosting). Backward wrapper model was applied as a feature selection methodology to help select the best feature that will improve the accuracy of the model. This study used 10 seasons of football data match history (2007/2008 – 2016/2017) in the English Premier League with 15 initial features to predict the match results. With the tuning process, each model showed improvement in accuracy. Random Forest algorithm generated the best accuracy with 68,55% while the C5.0 algorithm had the lowest accuracy at 64,87% and Extreme Gradient Boosting algorithm produced accuracy of 67,89%. With the output produced in this study, the Decision Tree based algorithm is concluded as not good enough in predicting a football match history.

[1]  T. Amraee,et al.  Bad data detection in state estimation using Decision Tree technique , 2017, 2017 Iranian Conference on Electrical Engineering (ICEE).

[2]  Jens Myrup Pedersen,et al.  A method for classification of network traffic based on C5.0 Machine Learning Algorithm , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[3]  Chinwe Peace Igiri An Improved Prediction System for Football a Match Result , 2014 .

[4]  安藤 寛,et al.  Cross-Validation , 1952, Encyclopedia of Machine Learning and Data Mining.

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Francisco José Climent Diranzo,et al.  Predicting failure in the U.S. banking sector: An extreme gradient boosting approach , 2019, International Review of Economics & Finance.

[7]  Albina Yezus Predicting outcome of soccer matches using machine learning , 2014 .

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Sulin Pang,et al.  C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks , 2009 .

[10]  Tianqi Chen,et al.  Higgs Boson Discovery with Boosted Trees , 2014, HEPML@NIPS.

[11]  Darwin Prasetio,et al.  Predicting football match results with logistic regression , 2016, 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA).

[12]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[13]  Carlos Guestrin,et al.  XGBoost : Reliable Large-scale Tree Boosting System , 2015 .

[14]  Florin Gorunescu,et al.  Data Mining - Concepts, Models and Techniques , 2011, Intelligent Systems Reference Library.

[15]  Aida Mustapha,et al.  Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL) , 2017, IOP Conference Series: Materials Science and Engineering.

[16]  Saurabh Pal,et al.  Early Prediction of Heart Diseases Using Data Mining Techniques , 2013 .

[17]  Chinwe Peace Igiri Support Vector Machine – Based Prediction System for a Football Match Result , 2015 .