THE IRRECONCILABILITY OF P-VALUES AND EVIDENCE

We place the paper [Berger-Sellke] By Berger and Sellke in context of the discussion about the validity of p-values and alternative methods for quantifying evidence against a point null hypothesis. 1. Prologue: Quantifying Evidence against a Point null hypothesis Fisher's p-values are everywhere in empirical science. Quoting a recent provocative paper [1] which addressed the (ir)reproducibility crisis in medical science, Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values . Somewhat more belligerent is [2]: And we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis signi cance testing [...] to the point of meaninglessness and beyond. This is a short review of the part philosophical, part statistical, part scienti c discussion within the statistical community about the Big Question: What is the correct way to quantify and weight empirical evidence against a point null hypothesis? For the purpose of this short text, it would be easier to focus on the simplest and most common case, so we will consider instead on the Small Question: Given a sample x = (x1, . . . , xn) of (X1, . . . , Xn) ∼ N ( θ, σ2 ) iid (σ2 known), what is the correct way to quantify and weight empirical evidence against the hypothesis of no e ect, H0 : θ = 0 ? Date: June 29, 2010.