Backtesting

When evaluating a trading strategy, it is routine to discount the Sharpe ratio from a historical backtest. The reason is simple according to the authors: there is inevitable data mining by both the researcher and by other researchers in the past. In this article, the authors provide a statistical framework that systematically accounts for these multiple tests. They propose a method to determine the appropriate haircut for any given reported Sharpe ratio. They also provide a profit hurdle that any strategy needs to achieve in order to be deemed “significant.”

[1]  A. Lo The Statistics of Sharpe Ratios , 2002 .

[2]  Pierre Bajgrowicz,et al.  Technical Trading Revisited: False Discoveries, Persistence Tests, and Transaction Costs , 2011 .

[3]  David H. Bailey,et al.  Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance , 2014 .

[4]  Wenge Guo,et al.  On a generalized false discovery rate , 2009, 0906.3091.

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Allan Timmermann,et al.  Choice of Sample Split in Out-of-Sample Forecast Evaluation , 2012 .

[7]  Y. Benjamini,et al.  A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence , 1999 .

[8]  H. White,et al.  Data‐Snooping, Technical Trading Rule Performance, and the Bootstrap , 1999 .

[9]  Yan Liu,et al.  Multiple Testing in Economics , 2013 .

[10]  S. Sarkar Some Results on False Discovery Rate in Stepwise multiple testing procedures , 2002 .

[11]  B. Holland,et al.  Neglect of Multiplicity When Testing Families of Related Hypotheses* , 2009 .

[12]  Bruce D. Phelps A Comprehensive Look at the Empirical Performance of Equity Premium Prediction , 2009 .

[13]  Andrew J. Patton,et al.  Monotonicity in asset returns: New tests with applications to the term structure, the CAPM, and portfolio sorts , 2010 .

[14]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[15]  E. Fama,et al.  Efficient Capital Markets : II , 2007 .

[16]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[17]  Jack L. Treynor,et al.  MUTUAL FUND PERFORMANCE* , 2007 .

[18]  Allan Timmermann,et al.  Dangers of data mining: the case of calendar effects in stock returns , 2001 .

[19]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[20]  Andrea Frazzini,et al.  Betting Against Beta , 2010 .

[21]  A. Lo,et al.  Data-Snooping Biases in Tests of Financial Asset Pricing Models , 1989 .

[22]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[23]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[24]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[25]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[26]  David H. Bailey,et al.  The Probability of Backtest Overfitting , 2015 .

[27]  Marcos Lopez de Prado,et al.  What to Look for in a Backtest , 2013 .

[28]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[29]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[30]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[31]  Jeffrey Pontiff,et al.  Does Academic Research Destroy Stock Return Predictability? , 2015 .

[32]  O. Scaillet,et al.  False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas , 2005 .

[33]  Jay Shanken,et al.  Intertemporal asset pricing: An Empirical Investigation , 1990 .

[34]  H. White,et al.  A Reality Check for Data Snooping , 2000 .

[35]  Russ Wermers,et al.  Can Mutual Fund 'Stars' Really Pick Stocks? New Evidence from a Bootstrap Analysis , 2005 .

[36]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[37]  George M. Constantinides,et al.  Handbook of the Economics of Finance , 2013 .

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .