The Jeffreys–Lindley paradox and discovery criteria in high energy physics

The Jeffreys–Lindley paradox displays how the use of a $$p$$p value (or number of standard deviations $$z$$z) in a frequentist hypothesis test can lead to an inference that is radically different from that of a Bayesian hypothesis test in the form advocated by Harold Jeffreys in the 1930s and common today. The setting is the test of a well-specified null hypothesis (such as the Standard Model of elementary particle physics, possibly with “nuisance parameters”) versus a composite alternative (such as the Standard Model plus a new force of nature of unknown strength). The $$p$$p value, as well as the ratio of the likelihood under the null hypothesis to the maximized likelihood under the alternative, can strongly disfavor the null hypothesis, while the Bayesian posterior probability for the null hypothesis can be arbitrarily large. The academic statistics literature contains many impassioned comments on this paradox, yet there is no consensus either on its relevance to scientific communication or on its correct resolution. The paradox is quite relevant to frontier research in high energy physics. This paper is an attempt to explain the situation to both physicists and statisticians, in the hope that further progress can be made.

[1]  S. Amerio,et al.  Measurement of the Bs(0)→μ+ μ- branching fraction and search for B(0)→μ+ μ- decays at the LHCb experiment. , 2013, Physical review letters.

[2]  G. Casella,et al.  Reconciling Bayesian and Frequentist Evidence in the One-Sided Testing Problem , 1987 .

[3]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[4]  A. Soha,et al.  Baryon Number Violation , 2013, 1311.5285.

[5]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[6]  B. L. Welch,et al.  On Formulae for Confidence Points Based on Integrals of Weighted Likelihoods , 1963 .

[7]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[8]  M. J. Esten,et al.  Observation of neutrino-like interactions without muon or electron in the Gargamelle neutrino experiment , 1974 .

[9]  G. Shafer Lindley's Paradox , 1982 .

[10]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[11]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[12]  F. James Statistical Methods in Experimental Physics , 1973 .

[13]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[14]  W. T. Eadie,et al.  Statistical Methods in Experimental Physics , 1973 .

[15]  I. Good The Bayes/Non-Bayes Compromise: A Brief Review , 1992 .

[16]  J. Kadane [Testing Precise Hypotheses]: Comment , 1987 .

[17]  D. Spiegelhalter,et al.  Bayes Factors and Choice Criteria for Linear Models , 1980 .

[18]  A. F. M. Smith,et al.  Integrated Objective Bayesian Estimation and Hypothesis Testing , 2011 .

[19]  P. Anderson The Reverend Thomas Bayes, Needles in Haystacks, and the Fifth Force , 1992 .

[20]  The Cms Collaboration Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC , 2012, 1207.7235.

[21]  James O. Berger Statistical Decision Theory , 1980 .

[22]  Asymptotic freedom: From paradox to paradigm , 2005 .

[23]  James O. Berger,et al.  The Bayesian Approach to Discovery , 2011 .

[24]  J. Bernardo,et al.  Bayesian Hypothesis Testing: a Reference Approach , 2002 .

[25]  K. Cranmer,et al.  Asymptotic formulae for likelihood-based tests of new physics , 2010, 1007.1727.

[26]  J. T. Childers,et al.  UvA-DARE (Digital Measurements of Higgs boson production and couplings in diboson final states with the ATLAS detector at the LHC , 2013 .

[27]  H. Jeffreys,et al.  Theory of probability , 1896 .

[28]  E. Gross,et al.  Trial factors for the look elsewhere effect in high energy physics , 2010, 1005.1891.

[29]  J. Neyman Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability , 1937 .

[30]  R. Kass Comment: The Importance of Jeffreys's Legacy , 2010, 1001.2970.

[31]  I. Good Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment , 1987 .

[32]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[33]  J. Berger,et al.  Interpreting the stars in precise hypothesis testing , 1991 .

[34]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[35]  James O. Berger,et al.  Objective Bayesian Methods for Model Selection: Introduction and Comparison , 2001 .

[36]  C. Robert,et al.  Harold Jeffreys’s Theory of Probability Revisited , 2008, 0804.3173.

[37]  A. Spanos Who Should Be Afraid of the Jeffreys-Lindley Paradox? , 2013, Philosophy of Science.

[38]  Peter Galison,et al.  How the first neutral-current experiments ended , 1983 .

[39]  Robert Cousins,et al.  Incorporating systematic uncertainties into an upper limit , 1992 .

[40]  G. Hooft Symmetry Breaking Through Bell-Jackiw Anomalies , 1976 .

[41]  J. Berger,et al.  [Testing Precise Hypotheses]: Rejoinder , 1987 .

[42]  M. Bartlett A comment on D. V. Lindley's statistical paradox , 1957 .

[43]  S. Senn Two cheers for P-values? , 2001, Journal of epidemiology and biostatistics.

[44]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[45]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[46]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[47]  Louis Lyons,et al.  Discovering the Significance of 5 sigma , 2013, 1310.1284.

[48]  N. Wardle Observation of a new particle in the search for the Standard Model Higgs boson at the CMS detector , 2013 .

[49]  Christian P. Robert,et al.  On the Jeffreys-Lindley Paradox , 2014, Philosophy of Science.

[50]  José M. Bernardo Bayes and Discovery: Objective Bayesian Hypothesis Testing , 2011 .

[51]  Christian Robert,et al.  On the Jeffreys-Lindley's paradox , 2013, 1303.5973.

[52]  C. Ferguson,et al.  A Vast Graveyard of Undead Theories , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[53]  R. Cousins TREATMENT OF NUISANCE PARAMETERS IN HIGH ENERGY PHYSICS, AND POSSIBLE JUSTIFICATIONS AND IMPROVEMENTS IN THE STATISTICS LITERATURE , 2006 .

[54]  V. M. Ghete,et al.  Study of the mass and spin-parity of the Higgs boson candidate via its decays to Z boson pairs. , 2012, Physical review letters.

[55]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[56]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[57]  M. Nessi The ATLAS detector at the LHC , 2011 .

[58]  A. A. Ocampo Rios,et al.  Measurement of the B(s) to mu+ mu- branching fraction and search for B0 to mu+ mu- with the CMS Experiment , 2013, 1307.5025.

[59]  Integrated objective Bayesian estimation and hypothesis testing : a discussion , 2010 .

[60]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[61]  D. Lindley A STATISTICAL PARADOX , 1957 .

[62]  G. Box Science and Statistics , 1976 .

[63]  R. Cousins,et al.  A Unified Approach to the Classical Statistical Analysis of Small Signals , 1997, physics/9711021.

[64]  Kate E Decleene,et al.  Publication Manual of the American Psychological Association , 2011 .

[65]  Richard E. Taylor,et al.  Parity non-conservation in inelastic electron scattering , 1978 .

[66]  C. Robert,et al.  A note on the confidence properties of reference priors for the calibration model , 1998 .

[67]  M. J. Esten,et al.  Observation of Neutrino Like Interactions Without Muon Or Electron in the Gargamelle Neutrino Experiment , 1973 .

[68]  Jeffrey E. Jarrett The Nature of Statistical Evidence , 2008, Technometrics.

[69]  S. Chatrchyan Erratum: Study of the Mass and Spin-Parity of the Higgs Boson Candidate via Its Decays to Z Boson Pairs [Phys. Rev. Lett. 110, 081803 (2013)] , 2013 .

[70]  Bayesian inference given data ‘significant atα’: Tests of point hypotheses , 1995 .

[71]  Joseph Berkson,et al.  Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test , 1938 .

[72]  F. James,et al.  Interpretation of the shape of the likelihood function around its minimum , 1980 .

[73]  Jerzy Neyman,et al.  The testing of statistical hypotheses in relation to probabilities a priori , 1933, Mathematical Proceedings of the Cambridge Philosophical Society.

[74]  E. Jaynes Probability theory : the logic of science , 2003 .

[75]  [Testing Precise Hypotheses]: Comment , 1987 .

[76]  StatisticsHarvard,et al.  Avoiding model selection in Bayesian social research , 1994 .

[77]  L. Demortier Open issues in the wake of Banff 2010 , 2010 .

[78]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[79]  D. Andrews The Large Sample Correspondence between Classical Hypothesis Tests and Bayesian Posterior Odds Tests , 1994 .

[80]  P. Barria,et al.  First Observation of Elec-troweak Single Top Quark Production , 2009 .

[81]  I. Have Measurement of the B ! , 2007 .

[82]  Jos'e M. Bernardo Comment on "Harold Jeffreys's Theory of Probability Revisited" , 2010 .

[83]  A. Zellner Comment on "Harold Jeffreys's Theory of Probability Revisited" , 2010, 1001.2985.

[84]  R. Kirk Practical Significance: A Concept Whose Time Has Come , 1996 .

[85]  G De Lorenzo,et al.  Observation of electroweak single top-quark production. , 2009, Physical review letters.

[86]  James O. Berger,et al.  A Comparison of Testing Methodologies , 2008 .

[87]  A. Zellner,et al.  Posterior odds ratios for selected regression hypotheses , 1980 .

[88]  J. Bernardo Nested Hypothesis Testing: The Bayesian Reference Criterion , 2001 .

[89]  Robert Cousins,et al.  Clarification of the use of CHI-square and likelihood functions in fits to histograms , 1984 .

[90]  J. Dickey Is the Tail Area Useful as an Approximate Bayes Factor , 1977 .

[91]  R. Davies Hypothesis testing when a nuisance parameter is present only under the alternative , 1977 .

[92]  Donald B. Rubin,et al.  Avoiding Model Selection in Bayesian Social Research , 1995 .

[93]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[94]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[95]  David A. van Dyk,et al.  The Role of Statistics in the Discovery of a Higgs Boson , 2014 .

[96]  D. Mayo,et al.  Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction , 2006, The British Journal for the Philosophy of Science.

[97]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[98]  A. P. Dawid,et al.  Parameter inference for stochastic kinetic models of bacterial gene regulation : a Bayesian approach to systems biology , 2010 .

[99]  Wagner,et al.  Improved upper limit on the branching ratio B(KL0--> micro+/-e+/-). , 1993, Physical review letters.

[100]  Adrian E. Raftery,et al.  REJOINDER: MODEL SELECTION IS UNAVOIDABLE IN SOCIAL RESEARCH , 1995 .

[101]  Howard Georgi,et al.  Effective Field Theory , 1993 .

[102]  R. E. Hall,et al.  Observation of single top-quark production. , 2009, Physical review letters.

[103]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[104]  V. M. Ghete,et al.  Measurement of the properties of a Higgs boson in the four-lepton final state , 2014 .