Towards smart-data: Improving predictive accuracy in long-term football team performance

Abstract Despite recent promising developments with large datasets and machine learning, the idea that automation alone can discover all key relationships between factors of interest remains a challenging task. Indeed, in many real-world domains, experts can often understand and identify key relationships that data alone may fail to discover, no matter how large the dataset. Hence, while pure machine learning provides obvious benefits, these benefits may come at a cost of accuracy. Here we focus on what we call smart-data; a method which supports data engineering and knowledge engineering approaches that put greater emphasis on applying causal knowledge and real-world ‘facts’ to the process of model development, driven by what data are really required for prediction, rather than by what data are available. We demonstrate how we exploited knowledge to develop a model that generates accurate predictions of the evolving performance of football teams based on limited data. The model enables us to predict, before a season starts, the total league points a team is expected to accumulate throughout the season. The results compare favourably against a number of other relevant and different types of models, and are on par with some other models which use far more data. The model results also provide a novel and comprehensive attribution study of the factors most influencing change in team performance, and partly address the cause of the widely accepted favourite-longshot bias observed in bookies odds.

[1]  Jan Vecer,et al.  Journal of Quantitative Analysis in Sports Estimating the Effect of the Red Card in Soccer : When to Commit an Offense in Exchange for Preventing a Goal Opportunity , 2011 .

[2]  D. Peel,et al.  Handicaps, outcome uncertainty and attendance demand , 1997 .

[3]  Robert Simmons,et al.  Outcome uncertainty and attendance demand in sport: the case of English soccer , 2002 .

[4]  A. P. Rotshtein,et al.  Football Predictions Based on a Fuzzy Model with Genetic and Neural Tuning , 2005 .

[5]  L. Knorr‐Held Dynamic Rating of Sports Teams , 2000 .

[6]  S. Coles,et al.  Modelling Association Football Scores and Inefficiencies in the Football Betting Market , 1997 .

[7]  Gianluca Baio,et al.  Bayesian hierarchical model for the prediction of football results , 2010 .

[8]  David Law,et al.  The Favourite-Longshot Bias and Market Efficiency in UK Football Betting , 2000 .

[9]  Lars Magnus Hvattum,et al.  Using ELO ratings for match result prediction in association football , 2010 .

[10]  Håvard Rue,et al.  Prediction and retrospective analysis of soccer matches in a league , 2000 .

[11]  Maik H. Wolters,et al.  Macroeconomic Model Comparisons and Forecast Competition , 2016 .

[12]  Anthony Constantinou,et al.  Bayesian networks for unbiased assessment of referee bias in Association Football , 2014 .

[13]  Raymond D. Sauer,et al.  The Economics of Wagering Markets , 1998 .

[14]  D. Peel,et al.  OUTCOME UNCERTAINTY and THE DEMAND FOR FOOTBALL: AN ANALYSIS OF MATCH ATTENDANCES IN THE ENGLISH FOOTBALL LEAGUE , 1988 .

[15]  Robert Simmons,et al.  Sentiment in the betting market on Spanish football , 2008 .

[16]  Robert Simmons,et al.  Odds setters as forecasters: the case of English football , 2005 .

[17]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[18]  D. Peel,et al.  Information, Prices and Efficiency in a Fixed-Odds Betting Market , 1989 .

[19]  Mark J. Dixon,et al.  Dynamic modelling and prediction of English Football League matches for betting , 2002 .

[20]  D. Peel,et al.  The demand for football: Some evidence on outcome uncertainty , 1992 .

[21]  Kurt Hornik,et al.  Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008 , 2010 .

[22]  M. Maher Modelling association football scores , 1982 .

[23]  Philip A. Scarf,et al.  On the Development of a Soccer Player Performance Rating System for the English Premier League , 2012, Interfaces.

[24]  Norman Fenton,et al.  Malcom Kendrick “ Doctoring Data : How to sort out medical advice from medical nonsense ” , 2015 .

[25]  I. Graham,et al.  Predicting bookmaker odds and efficiency for UK football , 2008 .

[26]  Norman Fenton,et al.  Integrating Expert Knowledge with Data in Bayesian Networks: Preserving Data-Driven Expectations when the Expert Variables Remain Unobserved , 2016, Expert Syst. Appl..

[27]  Anthony Constantinou,et al.  Profiting from an inefficient association football gambling market: Prediction, risk and uncertainty using Bayesian networks , 2013, Knowl. Based Syst..

[28]  N. Fenton,et al.  Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries , 2013 .

[29]  Anthony Constantinou,et al.  Profiting from arbitrage and odds biases of the European football gambling market , 2013 .

[30]  Anthony Constantinou,et al.  pi-football: A Bayesian network model for forecasting Association Football match outcomes , 2012, Knowl. Based Syst..

[31]  Mark J. Dixon,et al.  The value of statistical forecasts in the UK association football betting market , 2004 .

[32]  Tim Kuypers,et al.  Information and efficiency: an empirical study of a fixed odds betting market , 2000 .

[33]  John Goddard,et al.  Forecasting football results and the efficiency of fixed‐odds betting , 2004 .

[34]  Martin Neil,et al.  Inference in hybrid Bayesian networks using dynamic discretization , 2007, Stat. Comput..

[35]  D. Karlis,et al.  Analysis of sports data by using bivariate Poisson models , 2003 .

[36]  Norman E. Fenton,et al.  1 2 3 4 5 6 7 , 2001 .

[37]  Hyeonsang Eom,et al.  A compound framework for sports results prediction: A football case study , 2008, Knowl. Based Syst..