Trustworthiness of statistical inference

We examine the role of trustworthiness and trust in statistical inference, arguing that it is the extent of trustworthiness in inferential statistical tools which enables trust in the conclusions. Certain tools, such as the p‐value and significance test, have recently come under renewed criticism, with some arguing that they damage trust in statistics. We argue the contrary, beginning from the position that the central role of these methods is to form the basis for trusted conclusions in the face of uncertainty in the data, and noting that it is the misuse and misunderstanding of these tools which damages trustworthiness and hence trust. We go on to argue that recent calls to ban these tools tackle the symptom, not the cause, and themselves risk damaging the capability of science to advance, as well as risking feeding into public suspicion of the discipline of statistics. The consequence could be aggravated mistrust of our discipline and of science more generally. In short, the very proposals could work in quite the contrary direction from that intended. We make some alternative proposals for tackling the misuse and misunderstanding of these methods, and for how trust in our discipline might be promoted.

[1]  Ann Oakley,et al.  Trust in Numbers , 1995 .

[2]  David Trafimow,et al.  A Test of the Null Hypothesis Significance Testing Procedure Correlation Argument , 2009, The Journal of general psychology.

[3]  Mercer Jennifer Ann,et al.  PUBLICATION manual of the American Psychological Association. , 1952, Psychological bulletin.

[4]  David Gal,et al.  Statistical Significance and the Dichotomization of Evidence , 2017 .

[5]  L. Brain Structure of the scientific paper. , 1965, British medical journal.

[6]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[7]  David Gal,et al.  Abandon Statistical Significance , 2017, The American Statistician.

[8]  D. Trafimow Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post p < 0.05 Universe , 2019, The American Statistician.

[9]  Andrew Ehrenberg,et al.  Deconstructing statistical questions - discussion , 1994 .

[10]  Jacob Cohen The earth is round (p < .05) , 1994 .

[11]  A. B. Hill The Environment and Disease: Association or Causation? , 1965, Proceedings of the Royal Society of Medicine.

[12]  Sander Greenland,et al.  Retire statistical significance , 2019 .

[13]  David B. Resnik,et al.  Ethical Guidelines for Statistical Practice , 1999 .

[14]  A. Spanos Statistical adequacy and the trustworthiness of empirical evidence: Statistical vs. substantive information , 2010 .

[15]  T. Sterkenburg Deborah G. Mayo: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars , 2020, Journal for General Philosophy of Science.

[16]  David J. Hand Who told you that , 2018 .

[17]  J. Powell,et al.  Understanding Social Research: Perspectives on Methodology and Practice , 1997 .

[18]  David J. Hand,et al.  Dark Data: Why What You Don’t Know Matters , 2020 .

[19]  R. Frick,et al.  The appropriate use of null hypothesis testing. , 1996 .

[20]  Markku Lehtonen The multiple faces of trust in statistics and indicators: A case for healthy mistrust and distrust , 2019 .

[21]  Heike Freud,et al.  Error And The Growth Of Experimental Knowledge , 2016 .

[22]  Nicholas P. Holmes,et al.  Justify your alpha , 2018, Nature Human Behaviour.

[23]  Amiran Baduashvili,et al.  How to Understand and Teach P-Values: A Diagnostic Test Framework. , 2020, Journal of clinical epidemiology.

[24]  William H. Woodall,et al.  Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban , 2019, The American Statistician.

[25]  Fabio Ricciato,et al.  Trusted smart statistics: Motivations and principles , 2019 .

[26]  John B. Carlin,et al.  Some Natural Solutions to the p-Value Communication Problem—and Why They Won’t Work , 2017 .

[27]  D. Trafimow Hypothesis testing and theory evaluation at the boundaries: surprising insights from Bayes's theorem. , 2003, Psychological review.

[28]  David J. Hand,et al.  Assessing the Performance of Classification Methods , 2012 .

[29]  Trust in Science and Changing Landscapes of Communication , 2022 .

[30]  David R. Cox,et al.  On Some Principles of Statistical Inference , 2015 .

[31]  M. S. Bartlett,et al.  Statistical methods and scientific inference. , 1957 .

[32]  Roderick Little,et al.  Calibrated Bayes, for Statistics in General, and Missing Data in Particular , 2011, 1108.1917.

[33]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[34]  Michael J. Marks,et al.  Editorial , 2015 .

[35]  Nicole A. Lazar,et al.  ASA Statement on Statistical Significance and p-Values , 2020 .

[36]  W. Rozeboom,et al.  Methodology: Foundations of Inference and Research in the Behavioral Sciences. , 1971 .

[37]  D. Trafimow Editorial , 2014 .

[38]  A. D. D. Groot,et al.  Methodology: Foundations of inference and research in the behavioral sciences , 1969 .

[39]  J. I The Design of Experiments , 1936, Nature.

[40]  Kyle Powys Whyte,et al.  Trust, expertise, and the philosophy of science , 2010, Synthese.

[41]  James F. Devlin,et al.  Trustworthiness and trust: influences and implications , 2014 .

[42]  Katherine Hawley Trust: A Very Short Introduction , 2012 .

[43]  Deborah G. Mayo,et al.  Statistical Inference as Severe Testing , 2018 .

[44]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[45]  C. Becker,et al.  Twenty Steps Towards an Adequate Inferential Interpretation of p-Values in Econometrics , 2017, Jahrbücher für Nationalökonomie und Statistik.

[46]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[47]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[48]  Trust Within Science : Dynamics and Norms of Knowledge Production , 2019 .

[49]  David J. Hand,et al.  Aspects of Data Ethics in a Changing World: Where Are We Now? , 2018, Big Data.

[50]  David J. Hand,et al.  Trusted smart statistics: The challenge of extracting usable aggregate information from new data sources , 2019, Statistical Journal of the IAOS.

[51]  David J. Hand,et al.  Deconstructing Statistical Questions , 1994 .

[52]  Patrick McGhee,et al.  The Code of Practice , 2016 .