Using random forests to estimate win probability before each play of an NFL game

Abstract Before any play of a National Football League (NFL) game, the probability that a given team will win depends on many situational variables (such as time remaining, yards to go for a first down, field position and current score) as well as the relative quality of the two teams as quantified by the Las Vegas point spread. We use a random forest method to combine pre-play variables to estimate Win Probability (WP) before any play of an NFL game. When a subset of NFL play-by-play data for the 12 seasons from 2001 to 2012 is used as a training dataset, our method provides WP estimates that resemble true win probability and accurately predict game outcomes, especially in the later stages of games. In addition to being intrinsically interesting in real time to observers of an NFL football game, our WP estimates can provide useful evaluations of plays and, in some cases, coaching decisions.

[1]  Alexander J. Stimpson,et al.  Turning the Tide: Big Plays and Psychological Momentum in the NFL , 2012 .

[2]  Hal S. Stern,et al.  A Brownian Motion Model for the Progress of Sports Scores , 1994 .

[3]  J. FryMichael,et al.  Searching for Momentum in the NFL , 2012 .

[4]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[5]  Josip Hucaljuk,et al.  Predicting football scores using machine learning techniques , 2011, 2011 Proceedings of the 34th International Convention MIPRO.

[6]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[7]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[10]  Brian M. Mills,et al.  Using Tree Ensembles to Analyze National Baseball Hall of Fame Voting Patterns: An Application to Discrimination in BBWAA Voting , 2011 .

[11]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[12]  Ruo Xu,et al.  Predictor augmentation in random forests , 2014 .

[13]  Wilson L. Price,et al.  Estimating NHL Scoring Rates , 2011 .

[14]  Michael H. Freiman Using Random Forests and Simulated Annealing to Predict Probabilities of Election to the Baseball Hall of Fame , 2010 .

[15]  Robin Genuer,et al.  Random Forests: some methodological insights , 2008, 0811.3619.

[16]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[17]  G. R. Lindsey,et al.  The Progress of the Score during a Baseball Game , 1961 .

[18]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[19]  Mitchel Lichtman,et al.  The Book: Playing the Percentages in Baseball , 2007 .

[20]  An Exploratory Study of Minor League Baseball Statistics , 2012 .