Sports Prediction and Betting Models in the Machine Learning Age: The Case of Tennis

Machine learning and its numerous variants have meanwhile become established tools in many areas of society. Several attempts have been made to apply machine learning to the prediction of the outcome of professional sports events and to exploit "inefficiencies" in the corresponding betting markets. On the example of tennis, this paper extends previous research by conducting one of the most extensive studies of its kind and applying a wide range of machine learning techniques to male and female professional singles matches. The paper shows that the average prediction accuracy cannot be increased to more than about 70%. Irrespective of the used model, most of the relevant information is embedded in the betting markets, and adding other match- and player-specific data does not lead to any significant improvement. Returns from applying predictions to the sports betting market are subject to high volatility and mainly negative over the longer term. This conclusion holds across most tested models, various money management strategies, and for backing the match favorites or outsiders. The use of model ensembles that combine the predictions from multiple approaches proves to be the most promising choice.

[1]  E. Štrumbelj,et al.  On determining probability forecasts from betting odds , 2014 .

[2]  Stephen R. Clarke,et al.  Combining player statistics to predict outcomes of tennis matches , 2005 .

[3]  Ian G. McHale,et al.  Anyone for Tennis (Betting)? , 2007 .

[4]  Max Franke Do Market Participants Misprice Lottery-Type Assets? Evidence from the European Soccer Betting Market , 2019, The Quarterly Review of Economics and Finance.

[5]  S. Clarke,et al.  Using official ratings to simulate major tennis tournaments , 2000 .

[6]  R. Koning Home advantage in professional tennis , 2011, Journal of sports sciences.

[7]  David J. Irons,et al.  Developing an improved tennis ranking system , 2014 .

[8]  Andre Cornman,et al.  Machine Learning for Professional Tennis Match Prediction and Betting , 2017 .

[9]  Martin Ingram A point-based Bayesian hierarchical model to predict the outcome of tennis matches , 2019, Journal of Quantitative Analysis in Sports.

[10]  Stephan Nüesch,et al.  Prediction accuracy of different market structures – bookmakers versus a betting exchange , 2010 .

[11]  Mateus de Araujo Fernandes Using Soft Computing Techniques for Prediction of Winners in Tennis Matches , 2017 .

[12]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[13]  Håvard Rue,et al.  Prediction and retrospective analysis of soccer matches in a league , 2000 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Š. Lyócsa,et al.  To bet or not to bet: a reality check for tennis betting market efficiency , 2018 .

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  A. Bröder,et al.  Predicting Wimbledon 2005 tennis results by mere player name recognition , 2007 .

[18]  Jan R. Magnus,et al.  Forecasting the winner of a tennis match , 2003, Eur. J. Oper. Res..

[19]  Shenjun Zhong,et al.  Beating the bookies with their own numbers - and how the online sports betting market is rigged , 2017, ArXiv.

[20]  Stephanie Kovalchik,et al.  Searching for the GOAT of tennis win prediction , 2016 .

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Thomas L. Saaty,et al.  Predicting the Outcome of a Tennis Tournament: Based on Both Data and Judgments , 2019, Journal of Systems Science and Systems Engineering.

[23]  S. V. Rheenen The Sentiment Bias in the Market for Tennis Betting , 2017 .

[24]  Chao-Chin Liu,et al.  Winning matches in Grand Slam men's singles: An analysis of player performance-related variables from 1991 to 2008 , 2013, Journal of sports sciences.

[25]  Giambattista Rossi,et al.  Forecasting with Social Media: Evidence from Tweets on Soccer Matches , 2018 .

[26]  Fuyu Yang,et al.  Framing Effects and the Market Selection Hypothesis , 2018 .

[27]  Ondřej Hubáček,et al.  Exploiting sports-betting market using machine learning , 2019, International Journal of Forecasting.

[28]  Ian G. McHale,et al.  A Bradley-Terry type model for forecasting tennis match results , 2011 .

[29]  R. Santamaría,et al.  Hidden Power of Trading Activity: The FLB in Tennis Betting Exchanges , 2019 .

[30]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[31]  Francesco Lisi Tennis betting: can statistics beat bookmakers? , 2017 .

[32]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[33]  MEng Computing – Final year project Machine Learning for the Prediction of Professional Tennis Matches , 2015 .

[34]  William J. Knottenbelt,et al.  A common-opponent stochastic model for predicting the outcome of professional tennis matches , 2012, Comput. Math. Appl..

[35]  Daniel Pettersson,et al.  Football match prediction using deep learning , 2017 .

[36]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..

[37]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[38]  Suphakant Phimoltares,et al.  Tennis Winner Prediction based on Time-Series History with Neural Modeling , 2009 .

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Helge Langseth Beating the bookie: A look at statistical models for prediction of football matches , 2013, SCAI.

[41]  Julio del Corral,et al.  Are differences in ranks good predictors for Grand Slam tennis matches , 2010 .

[42]  Tamás D. Gedeon,et al.  Data Mining of Inputs: Analysing Magnitude and Functional Measures , 1997, Int. J. Neural Syst..

[43]  Vasant A. Sukhatme,et al.  Testing Rosen's Sequential Elimination Tournament Model , 2008 .

[44]  K. Hornik,et al.  Is Federer Stronger in a Tournament Without Nadal? An Evaluation of Odds and Seedings for Wimbledon 2009 , 2016 .

[45]  Dominic Cortis EXPECTED VALUES AND VARIANCES IN BOOKMAKER PAYOUTS: A THEORETICAL APPROACH TOWARDS SETTING LIMITS ON ODDS , 2015 .

[46]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[47]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[48]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .