A new methodology for generating and combining statistical forecasting models to enhance competitive event prediction

Forecasting methods are routinely employed to predict the outcome of competitive events (CEs) and to shed light on the factors that influence participants’ winning prospects (e.g., in sports events, political elections). Combining statistical models’ forecasts, shown to be highly successful in other settings, has been neglected in CE prediction. Two particular difficulties arise when developing model-based composite forecasts of CE outcomes: the intensity of rivalry among contestants, and the strength/diversity trade-off among individual models. To overcome these challenges we propose a range of surrogate measures of event outcome to construct a heterogeneous set of base forecasts. To effectively extract the complementary information concealed within these predictions, we develop a novel pooling mechanism which accounts for competition among contestants: a stacking paradigm integrating conditional logit regression and log-likelihood-ratio-based forecast selection. Empirical results using data related to horseracing events demonstrate that: (i) base model strength and diversity are important when combining model-based predictions for CEs; (ii) average-based pooling, commonly employed elsewhere, may not be appropriate for CEs (because average-based pooling exclusively focuses on strength); and (iii) the proposed stacking ensemble provides statistically and economically accurate forecasts. These results have important implications for regulators of betting markets associated with CEs and in particular for the accurate assessment of market efficiency.

[1]  J. M. Bates,et al.  The Combination of Forecasts , 1969 .

[2]  David Johnstone,et al.  Finding profitable forecast combinations using probability scoring rules , 2010 .

[3]  Ethem Alpaydin,et al.  Incremental construction of classifier and discriminant ensembles , 2009, Inf. Sci..

[4]  Francisco Louzada-Neto,et al.  A Bayesian approach for predicting match outcomes: The 2006 (Association) Football World Cup , 2010, J. Oper. Res. Soc..

[5]  Ronald Dattero,et al.  Combining vector forecasts to predict thoroughbred horse race outcomes , 1992 .

[6]  Andrew Leigh,et al.  Three Tools for Forecasting Federal Elections: Lessons from 2001 , 2001 .

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[9]  Stefan Lessmann,et al.  Alternative methods of predicting competitive events: An application in horserace betting markets , 2010 .

[10]  Grigorios Tsoumakas,et al.  An Ensemble Pruning Primer , 2009, Applications of Supervised and Unsupervised Ensemble Methods.

[11]  C. Granger,et al.  Economic and Statistical Measures of Forecast Accuracy , 1999 .

[12]  Leighton Vaughan Williams,et al.  Information Efficiency in Betting Markets: A Survey , 1999 .

[13]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[14]  Robert P. W. Duin,et al.  An experimental study on diversity for bagging and boosting with linear classifiers , 2002, Inf. Fusion.

[15]  Leighton Vaughan Williams,et al.  Can forecasters forecast successfully? Evidence from UK betting markets , 2000 .

[16]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[17]  Roy Batchelor,et al.  Forecaster diversity and the benefits of combining forecasts , 1995 .

[18]  C. Granger,et al.  Experience with Forecasting Univariate Time Series and the Combination of Forecasts , 1974 .

[19]  Robert L. Winkler,et al.  Simple robust averages of forecasts: Some empirical results , 2008 .

[20]  E. Adam,et al.  An empirical evaluation of alternative forecasting combinations , 1987 .

[21]  David J. Johnstone,et al.  The Parimutuel Kelly Probability Scoring Rule , 2007, Decis. Anal..

[22]  Stephen Figlewski Subjective Information and Market Efficiency in a Betting Market , 1979, Journal of Political Economy.

[23]  W. W. Snyder,et al.  HORSE RACING: TESTING THE EFFICIENT MARKETS MODEL , 1978 .

[24]  Raphael N. Markellos,et al.  How Efficient is the European Football Betting Market? Evidence from Arbitrage and Trading Strategies , 2009 .

[25]  David Johnstone,et al.  Economic Interpretation of Probabilities Estimated by Maximum Likelihood or Score , 2011, Manag. Sci..

[26]  N. Crafts,et al.  Some Evidence of Insider Knowledge in Horse Race Betting in Britain , 1985 .

[27]  Stefan Lessmann,et al.  Identifying winners of competitive events: A SVM-based classification model for horserace prediction , 2009, Eur. J. Oper. Res..

[28]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[29]  U. Johansson,et al.  Neural networks mine for gold at the greyhound racetrack , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[30]  Yu-Hsin Liu,et al.  Incorporating scatter search and threshold accepting in finding maximum likelihood estimates for the multinomial probit model , 2011, Eur. J. Oper. Res..

[31]  David Rothschild,et al.  Forecasting Elections Comparing Prediction Markets, Polls, and Their Biases , 2009 .

[32]  Robert L. Losey,et al.  Back on the Track with the Efficient Markets Hypothesis , 1980 .

[33]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[34]  David Edelman Adapting support vector machine methods for horserace odds prediction , 2003, ISICT.

[35]  Gordon Leitch,et al.  Economic Forecast Evaluation: Profits versus the Conventional Error Measures , 1991 .

[36]  Michiel C. van Wezel,et al.  Improved customer choice predictions using ensemble methods , 2005, Eur. J. Oper. Res..

[37]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[38]  Johnnie E.V. Johnson,et al.  Efficiency in a market for state contingent claims , 1999 .

[39]  A. Timmermann Chapter 4 Forecast Combinations , 2006 .

[40]  Rich Caruana,et al.  Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  Owen Jones,et al.  Exploring Decision Makers' Use of Price Information in a Speculative Market , 2006, Manag. Sci..

[42]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[43]  David Johnstone,et al.  Economic Darwinism: Who has the Best Probabilities? , 2007 .

[44]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[45]  C. Granger Invited review combining forecasts—twenty years later , 1989 .

[46]  W. F. Benter Computer-Based Horse Race Handicapping and Wagering Systems , 2008 .

[47]  Derek W. Bunn,et al.  Review of guidelines for the use of combined forecasts , 2000, Eur. J. Oper. Res..

[48]  Ming-Chien Sung,et al.  COMPARING THE EFFECTIVENESS OF ONE- AND TWO-STEP CONDITIONAL LOGIT MODELS FOR PREDICTING OUTCOMES IN A SPECULATIVE MARKET , 2012 .

[49]  Steven Finlay,et al.  Multiple classifier architectures and their application to credit risk assessment , 2011, Eur. J. Oper. Res..

[50]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[51]  Robert L. Winkler,et al.  Evaluating and Combining Physicians' Probabilities of Survival in an Intensive Care Unit , 1993 .

[52]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[53]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[54]  W. Sharpe The Sharpe Ratio , 1994 .

[55]  A. Diederich,et al.  Evaluating and Combining Subjective Probability Estimates , 1997 .

[56]  Johnnie E.V. Johnson Successful betting strategies: evidence from the UK offcourse betting market , 1992 .

[57]  Andrés R. Masegosa,et al.  An ensemble method using credal decision trees , 2010, Eur. J. Oper. Res..

[58]  Ming-Chien Sung,et al.  Information Efficiency in Financial and Betting Markets: Searching for semi-strong form inefficiency in the UK racetrack betting market , 2005 .

[59]  Vivian West,et al.  Computing, Artificial Intelligence and Information Technology Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application , 2005 .

[60]  David Edelman On the Financial Value of Information , 2000, Ann. Oper. Res..

[61]  Mark J. Dixon,et al.  The value of statistical forecasts in the UK association football betting market , 2004 .

[62]  Randall G. Chapaaan,et al.  Exploiting Rank Ordered Choice Set Data within the Stochastic Utility Model , 1982 .

[63]  Jean-Philippe Thiran,et al.  Information theoretic combination of pattern classifiers , 2010, Pattern Recognit..

[64]  Francis X. Diebold,et al.  Forecast combination and encompassing: Reconciling two divergent literatures , 1989 .

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  R. L. Winkler,et al.  Averages of Forecasts: Some Empirical Results , 1983 .

[67]  Martin Spann,et al.  Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters , 2009 .

[68]  Ruth N. Bolton,et al.  Searching for positive returns at the track: a multinomial logic model for handicapping horse races , 1986 .

[69]  David C. Schmittlein,et al.  Combining Forecasts: Operational Adjustments to Theoretically Optimal Rules , 1990 .