Presidential Address: The Scientific Outlook in Financial Economics: Scientific Outlook in Finance

Given the competition for top journal space, there is an incentive to produce “significant” results. With the combination of unreported tests, lack of adjustment for multiple tests, and direct and indirect p-hacking, many of the results being published will fail to hold up in the future. In addition, there are basic issues with the interpretation of statistical significance. Increasing thresholds may be necessary, but still may not be sufficient: if the effect being studied is rare, even t > 3 will produce a large number of false positives. Here I explore the meaning and limitations of a p-value. I offer a simple alternative (the minimum Bayes factor). I present guidelines for a robust, transparent research culture in financial economics. Finally, I offer some thoughts on the importance of risk-taking (from the perspective of authors and editors) to advance

[1]  S. Goodman,et al.  p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.

[2]  How to Write an Effective Referee Report and Improve the Scientific Review Process , 2017 .

[3]  R. Collins,et al.  Prevention of coronary and stroke events with atorvastatin in hypertensive patients who have average or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian Cardiac Outcomes Trial—Lipid Lowering Arm (ASCOT-LLA): a multicentre randomised controlled trial , 2003, The Lancet.

[4]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[5]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[6]  Daniele Fanelli Positive results receive more citations, but only in some disciplines , 2012, Scientometrics.

[7]  Campbell R. Harvey,et al.  Editor's Choice … and the Cross-Section of Expected Returns , 2016 .

[8]  Yoshinori Hatori,et al.  The Difference between Significant and Non-significant , 2017 .

[9]  A. Gelman,et al.  The garden of forking paths : Why multiple comparisons can be a problem , even when there is no “ fishing expedition ” or “ p-hacking ” and the research hypothesis was posited ahead of time ∗ , 2019 .

[10]  Jacob Cohen The earth is round (p < .05) , 1994 .

[11]  D. Marsh,et al.  Seeing What We Want to See: Confirmation Bias in Animal Behavior Research , 2007 .

[12]  G. Churchill,et al.  When Are Results Too Good to Be True? , 2014, Genetics.

[13]  James O. Berger,et al.  Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.

[14]  Brian A. Nosek,et al.  Scientific Utopia , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[15]  Jeffrey L. Coles,et al.  Boards: Does One Size Fit All? , 2005 .

[16]  K J Rothman,et al.  That confounded P-value. , 1998, Epidemiology.

[17]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[18]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[19]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[20]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[21]  Stuart J. Ritchie,et al.  Failing the Future: Three Unsuccessful Attempts to Replicate Bem's ‘Retroactive Facilitation of Recall’ Effect , 2012, PloS one.

[22]  Alex J. Sutton,et al.  Publication and related biases: a review , 2000 .

[23]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[24]  Daniel M. Oppenheimer,et al.  Predicting short-term stock fluctuations by using processing fluency. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Daniele Fanelli,et al.  Negative results are disappearing from most disciplines and countries , 2011, Scientometrics.

[26]  J. Hutton Misleading Statistics , 2010, Pharmaceutical Medicine.

[27]  N. Kerr HARKing: Hypothesizing After the Results are Known , 1998, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[28]  D. Bem Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. , 2011, Journal of personality and social psychology.

[29]  Joseph Berkson Tests of significance considered as evidence , 2003 .

[30]  A. Gelman,et al.  The statistical crisis in science , 2014 .

[31]  E. Fama,et al.  Risk, Return, and Equilibrium: Empirical Tests , 1973, Journal of Political Economy.

[32]  R. P. Carver The Case Against Statistical Significance Testing , 1978 .

[33]  Jennifer Wiley,et al.  What Are the Odds? A Practical Guide to Computing and Reporting Bayes Factors , 2014, J. Probl. Solving.

[34]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[35]  J. Mossin EQUILIBRIUM IN A CAPITAL ASSET MARKET , 1966 .

[36]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[37]  A. Gelman,et al.  The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant , 2006 .

[38]  Gregory Francis,et al.  Too good to be true: Publication bias in two prominent studies from experimental psychology , 2012, Psychonomic Bulletin & Review.

[39]  Tyler J. VanderWeele,et al.  Marital satisfaction and break-ups differ across on-line and off-line meeting venues , 2013, Proceedings of the National Academy of Sciences.

[40]  G. Gauchat Politicization of Science in the Public Sphere , 2012 .

[41]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[42]  S. Goodman,et al.  Of P-values and Bayes: a modest proposal. , 2001, Epidemiology.

[43]  W. Sharpe CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[44]  S. Golder,et al.  The effectiveness and cost-effectiveness of prophylactic removal of wisdom teeth. , 2000, Health technology assessment.

[45]  Joseph Berkson,et al.  Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test , 1938 .

[46]  J. Greenhouse On becoming a Bayesian: Early correspondences between J. Cornfield and L. J. Savage , 2012, Statistics in medicine.

[47]  R. Srinivasan,et al.  There’s Something in a Name: Value Relevance of Congruent Ticker Symbols , 2014 .

[48]  J. Lintner THE VALUATION OF RISK ASSETS AND THE SELECTION OF RISKY INVESTMENTS IN STOCK PORTFOLIOS AND CAPITAL BUDGETS , 1965 .

[49]  Gary Smith,et al.  Would a stock by any other ticker smell as sweet , 2009 .

[50]  D. Yermack Higher market valuation of companies with a small board of directors , 1996 .

[51]  Leif D. Nelson,et al.  P-Curve: A Key to the File Drawer , 2013, Journal of experimental psychology. General.

[52]  I. Welch,et al.  A Comprehensive Look at the Empirical Performance of Equity Premium Prediction II , 2004, SSRN Electronic Journal.

[53]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[54]  R. Lanfear,et al.  The Extent and Consequences of P-Hacking in Science , 2015, PLoS biology.

[55]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[56]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[57]  D. Fanelli “Positive” Results Increase Down the Hierarchy of the Sciences , 2010, PloS one.