All That Glitters Is Not Gold: Comparing Backtest and Out-of-Sample Performance on a Large Cohort of Trading Algorithms

When automated trading strategies are developed and evaluated using backtests on historical pricing data, there exists a tendency to overfit to the past. Using a unique dataset of 888 algorithmic trading strategies developed and backtested on the Quantopian platform with at least 6 months of out-of-sample performance, we study the prevalence and impact of backtest overfitting. Specifically, we find that commonly reported backtest evaluation metrics like the Sharpe ratio offer little value in predicting out of sample performance (R²

[1]  Gisele L. Pappa,et al.  From an artificial neural network to a stock market day-trading system: A case study on the BM&F BOVESPA , 2009, 2009 International Joint Conference on Neural Networks.

[2]  Patrick Burns Random Portfolios for Evaluating Trading Strategies , 2006 .

[3]  Jonathan M. Borwein,et al.  PSEUDO MATHEMATICS AND FINANCIAL CHARLATANISM: BACKTEST OVERFITTING AND OUT-OF-SAMPLE PERFORMANCE , 2013 .

[4]  Zura Kakushadze,et al.  101 Formulaic Alphas , 2015, 1601.00991.

[5]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[6]  Frank Schorfheide,et al.  On the Use of Holdout Samples for Model Selection , 2012 .

[7]  The Pitfall of Using Sharpe Ratio , 2003 .

[8]  Kevin Boudreau,et al.  Performance Responses To Competition Across Skill-Levels In Rank Order Tournaments: Field Evidence and Implications For Tournament Design , 2015 .

[9]  Richard B. Spurgin How to Game Your Sharpe Ratio , 2001 .

[10]  Jeffrey Pontiff,et al.  Does Academic Research Destroy Stock Return Predictability? , 2015 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  David H. Bailey,et al.  The probability of back-test over-fitting , 2015 .

[13]  Marcos Lopez de Prado,et al.  What to Look for in a Backtest , 2013 .

[14]  A. Lo The Statistics of Sharpe Ratios , 2002 .

[15]  Steve Christie,et al.  Is the Sharpe Ratio Useful in Asset Allocation , 2005 .

[16]  Patrick Beaudan Telling the Good from the Bad and the Ugly: How to Evaluate Backtested Investment Strategies , 2013 .

[17]  Hongjun Yan,et al.  A Model of Anomaly Discovery , 2015 .

[18]  David H. Bailey,et al.  The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality , 2014, The Journal of Portfolio Management.

[19]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[20]  Zura Kakashadze,et al.  101 Formulaic Alphas: 101 Formulaic Alphas , 2016 .

[21]  Bin Li,et al.  On-Line Portfolio Selection with Moving Average Reversion , 2012, ICML.