Searching for the GOAT of tennis win prediction

Abstract Sports forecasting models – beyond their interest to bettors – are important resources for sports analysts and coaches. Like the best athletes, the best forecasting models should be rigorously tested and judged by how well their performance holds up against top competitors. Although a number of models have been proposed for predicting match outcomes in professional tennis, their comparative performance is largely unknown. The present paper tests the predictive performance of 11 published forecasting models for predicting the outcomes of 2395 singles matches during the 2014 season of the Association of Tennis Professionals Tour. The evaluated models fall into three categories: regression-based, point-based, and paired comparison models. Bookmaker predictions were used as a performance benchmark. Using only 1 year of prior performance data, regression models based on player ranking and an Elo approach developed by FiveThirtyEight were the most accurate approaches. The FiveThirtyEight model predictions had an accuracy of 75% for matches of the most highly-ranked players, which was competitive with the bookmakers. The inclusion of career-to-date improved the FiveThirtyEight model predictions for lower-ranked players (from 59% to 64%) but did not change the performance for higher-ranked players. All models were 10–20 percentage points less accurate at predicting match outcomes among lower-ranked players than matches with the top players in the sport. The gap in performance according to player ranking and the simplicity of the information used in Elo ratings highlight directions for further model development that could improve the practical utility and generalizability of forecasting in tennis.

[1]  K. Hornik,et al.  Is Federer Stronger in a Tournament Without Nadal? An Evaluation of Odds and Seedings for Wimbledon 2009 , 2016 .

[2]  Julio del Corral,et al.  Are differences in ranks good predictors for Grand Slam tennis matches , 2010 .

[3]  J. Magnus,et al.  Are Points in Tennis Independent and Identically Distributed? Evidence From a Dynamic Binary Panel Data Model , 2001 .

[4]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[5]  M. Glickman Parameter Estimation in Large Dynamic Paired Comparison Experiments , 1999 .

[6]  David J. Irons,et al.  Developing an improved tennis ranking system , 2014 .

[7]  David Firth,et al.  Bradley-Terry Models in R: The BradleyTerry2 Package , 2012 .

[8]  Y. Fery,et al.  On the Tactical Significance of Game Situations in Anticipating Ball Trajectories in Tennis , 2001, Research quarterly for exercise and sport.

[9]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[10]  Herman Stekler,et al.  Are sports seedings good predictors?: an evaluation , 1999 .

[11]  A. Elo The rating of chessplayers, past and present , 1978 .

[12]  William J. Knottenbelt,et al.  A common-opponent stochastic model for predicting the outcome of professional tennis matches , 2012, Comput. Math. Appl..

[13]  Luke Bornn,et al.  A mixture-of-modelers approach to forecasting NCAA tournament outcomes , 2015 .

[14]  S. Hanton,et al.  What Is This Thing Called Mental Toughness? An Investigation of Elite Sport Performers , 2002 .

[15]  Jan R. Magnus,et al.  Forecasting the winner of a tennis match , 2003, Eur. J. Oper. Res..

[16]  Hyun Song Shin,et al.  Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims , 1993 .

[17]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[18]  J. B. Keller,et al.  Probability of Winning at Tennis I. Theory and Data , 2005 .

[19]  E. Štrumbelj,et al.  On determining probability forecasts from betting odds , 2014 .

[20]  Stephen R. Clarke,et al.  Combining player statistics to predict outcomes of tennis matches , 2005 .

[21]  William J. Knottenbelt,et al.  Predicting the outcomes of tennis matches using a low-level point model , 2013 .

[22]  Vasant A. Sukhatme,et al.  Testing Rosen's Sequential Elimination Tournament Model , 2008 .

[23]  Ian G. McHale,et al.  A Bradley-Terry type model for forecasting tennis match results , 2011 .

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[26]  R. Stefani,et al.  Journal of Quantitative Analysis in Sports The Methodology of Officially Recognized International Sports Rating Systems , 2012 .

[27]  Achim Zeileis,et al.  Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned , 2012 .

[28]  Sandjai Bhulai,et al.  The predictive power of ranking systems in association football , 2013, Int. J. Appl. Pattern Recognit..