Route identification in the National Football League

Abstract Tracking data in the National Football League (NFL) is a sequence of spatial-temporal measurements that varies in length depending on the duration of the play. In this paper, we demonstrate how model-based curve clustering of observed player trajectories can be used to identify the routes run by eligible receivers on offensive passing plays. We use a Bernstein polynomial basis function to represent cluster centers, and the Expectation Maximization algorithm to learn the route labels for each of the 33,967 routes run on the 6963 passing plays in the data set. With few assumptions and no pre-existing labels, we are able to closely recreate the standard route tree from our algorithm. We go on to suggest ideas for new potential receiver metrics that account for receiver deployment and movement common throughout the league. The resulting route labels can also be paired with film to enable streamlined queries of game film.

[1]  Andrew C. Miller Possession Sketches : Mapping NBA Strategies , 2017 .

[2]  Gilda Soromenho,et al.  Fitting mixtures of linear regressions , 2010 .

[4]  Shane T. Jensen,et al.  openWAR: An open source system for evaluating overall player performance in major league baseball , 2013, 1312.7158.

[5]  Faicel Chamroukhi,et al.  Robust EM algorithm for model-based curve clustering , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[6]  Mark Broadie,et al.  Assessing Golfer Performance on the PGA TOUR , 2012, Interfaces.

[7]  Servane Gey,et al.  Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering , 2018, Applied Sciences.

[8]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[9]  Christian Duffin Routes to success. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[10]  Hal S. Stern,et al.  A Brownian Motion Model for the Progress of Sports Scores , 1994 .

[11]  Liangliang Wang,et al.  Functional principal component analysis of glomerular filtration rate curves after kidney transplant , 2018, Statistical methods in medical research.

[12]  Charles Bouveyron,et al.  Model-based clustering of time series in group-specific functional subspaces , 2011, Adv. Data Anal. Classif..

[13]  Sameer K. Deshpande,et al.  Expected hypothetical completion probability , 2019, Journal of Quantitative Analysis in Sports.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[16]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[17]  Paul D. McNicholas,et al.  Model-based clustering of microarray expression data via latent Gaussian mixture models , 2010, Bioinform..

[18]  B. Burke DeepQB: Deep Learning with Player Tracking to Quantify Quarterback Decision-Making & Performance , 2019 .

[19]  A. Alshaher Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm , 2018 .

[20]  Samuel L. Ventura,et al.  nflWAR: a reproducible method for offensive player evaluation in football , 2018, Journal of Quantitative Analysis in Sports.