When to Pull Starting Pitchers in Major League Baseball? A Data Mining Approach

One of the most important decisions made by managers in a baseball game is when to pull the starting pitcher. It has a direct consequence on the outcome of the game and also on the physical fitness of the pitcher. Traditionally, managers rely on various heuristics for this decision. In this paper, we propose a machine learning based approach to determine when to replace the starting pitcher. We curate a large dataset of more than one million samples, spanning more than 10 years of baseball games (2007 - 2017), and study the performance of various classification algorithms on this dataset. We further perform feature analysis to gain insights on the most important features influencing the replacement of starting pitchers. To the best of our knowledge, this is the first research effort to leverage machine learning and data analytics to model managers' decisions of pulling a starting pitcher from historic data. Such a system can be immensely useful in assisting managers make more informed decisions during an ongoing game and has the potential to reduce the risk of baseball related injuries. We hope that our curated dataset and initial research findings will promote further work toward this important problem of deciding when to pull a starting pitcher in an ongoing baseball game.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[5]  Matthew Stephan,et al.  Machine Learning Applications in Baseball: A Systematic Literature Review , 2017, Appl. Artif. Intell..

[6]  Michael Hamilton,et al.  A Dynamic Feature Selection Based LDA Approach to Baseball Pitch Prediction , 2015, PAKDD Workshops.

[7]  Using Machine Learning Algorithms to Identify Undervalued Baseball Players , 2016 .

[8]  César Soto-Valero,et al.  Predicting Win-Loss outcomes in MLB regular season games - A comparative study using data mining methods , 2016, Int. J. Comput. Sci. Sport.

[9]  Theodore Trafalis,et al.  Predicting Major League Baseball Championship Winners through Data Mining , 2016 .

[10]  John V. Guttag,et al.  Predicting the Next Pitch , 2012 .

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Glenn Daniel Sidle,et al.  Using Multi-Class Machine Learning Methods to Predict Major League Baseball Pitches. , 2017 .

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Mark Fichman,et al.  From Darwin to the Diamond: How Baseball and Billy Beane Arrived at Moneyball , 2012 .