Identifying NCAA tournament upsets using Balance Optimization Subset Selection

Abstract The NCAA basketball tournament attracts over 60 million people who fill out a bracket to try to predict the outcome of every tournament game correctly. Predictions are often made on the basis of instinct, statistics, or a combination of the two. This paper proposes a technique to select round-of-64 upsets in the tournament using a Balance Optimization Subset Selection model. The model determines which games feature match-ups that are statistically most similar to the match-ups in historical upsets. The technique is then applied to the tournament in each of the 13 years from 2003 to 2015 in order to select two games as potential upsets each year. Of the 26 selected games, 10 (38.4%) were actual upsets, which is more than twice as many as the expected number of correct selections when using a weighted random selection method.

[1]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[2]  Sheldon H. Jacobson,et al.  Complexity and Approximation Results for the Balance Optimization Subset Selection Model for Causal Inference in Observational Studies , 2014, INFORMS J. Comput..

[3]  D. Rubin Bias Reduction Using Mahalanobis-Metric Matching , 1980 .

[4]  Kevin Bryan,et al.  Upset Special: Are March Madness Upsets Predictable? , 2006 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  Gregory J. Matthews,et al.  Building an NCAA men’s basketball predictive model and quantifying its success , 2014, 1412.0248.

[8]  Sheldon H. Jacobson,et al.  Balance Optimization Subset Selection (BOSS): An Alternative Approach for Causal Inference with Observational Data , 2013, Oper. Res..

[9]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[10]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[11]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[12]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.