False confidence, non-additive beliefs, and valid statistical inference

Statistics has made tremendous advances since the times of Fisher, Neyman, Jeffreys, and others, but the fundamental and practically relevant questions about probability and inference that puzzled our founding fathers remain unanswered. To bridge this gap, I propose to look beyond the two dominating schools of thought and ask the following three questions: what do scientists need out of statistics, do the existing frameworks meet these needs, and, if not, how to fill the void? To the first question, I contend that scientists seek to convert their data, posited statistical model, etc., into calibrated degrees of belief about quantities of interest. To the second question, I argue that any framework that returns additive beliefs, i.e., probabilities, necessarily suffers from {\em false confidence}---certain false hypotheses tend to be assigned high probability---and, therefore, risks systematic bias. This reveals the fundamental importance of {\em non-additive beliefs} in the context of statistical inference. But non-additivity alone is not enough so, to the third question, I offer a sufficient condition, called {\em validity}, for avoiding false confidence, and present a framework, based on random sets and belief functions, that provably meets this condition. Finally, I discuss characterizations of p-values and confidence intervals in terms of valid non-additive beliefs, which imply that users of these classical procedures are already following the proposed framework without knowing it.

[1]  G. Shafer,et al.  The Sources of Kolmogorov’s Grundbegriffe , 2006, math/0606533.

[2]  Ryan Martin,et al.  Marginal Inferential Models: Prior-Free Probabilistic Inference on Interest Parameters , 2013, 1306.3092.

[3]  Randy C. S. Lai,et al.  Generalized Fiducial Inference: A Review and New Results , 2016 .

[4]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[5]  Glenn Shafer,et al.  A Mathematical Theory of Evidence turns 40 , 2016, Int. J. Approx. Reason..

[6]  Leonardo Cella,et al.  Incorporating Expert Opinion in an Inferential Model while Maintaining Validity , 2019, ISIPTA.

[7]  L. L. Cam,et al.  Maximum likelihood : an introduction , 1990 .

[8]  David Gal,et al.  Abandon Statistical Significance , 2017, The American Statistician.

[9]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[10]  A. W. F. Edwards What Did Fisher Mean by "Inverse Probability" in 1912-1922? , 1997 .

[11]  D. Fraser,et al.  Bayes, Reproducibility and the Quest for Truth , 2016 .

[12]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[13]  Donald Fraser Why does statistics have two theories , 2014 .

[14]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[15]  J. Aldrich Fisher's “Inverse Probability” of 1930 , 2000 .

[16]  Ariel M. Aloe,et al.  Life After NHST: How to Describe Your Data Without “p-ing” Everywhere , 2015 .

[17]  Deborah G. Mayo,et al.  On the Birnbaum Argument for the Strong Likelihood Principle , 2014 .

[18]  R. Wolpert,et al.  Likelihood Principle , 2022, The SAGE Encyclopedia of Research Design.

[19]  Ryan Martin,et al.  Likelihood-free Bayesian inference on the minimum clinically important difference , 2015, 1501.01840.

[20]  Nicholas P. Holmes,et al.  Justify your alpha , 2018, Nature Human Behaviour.

[21]  B. Efron Bayes' Theorem in the 21st Century , 2013, Science.

[22]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[23]  J. Aldrich,et al.  R. A. Fisher on Bayes and Bayes' Theorem , 2008 .

[24]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[25]  Peter Grünwald,et al.  Safe Probability , 2016, ArXiv.

[26]  M. Schervish P Values: What They are and What They are Not , 1996 .

[27]  Jiunn T. Hwang,et al.  The Nonexistence of 100$(1 - \alpha)$% Confidence Sets of Finite Expected Diameter in Errors-in-Variables and Related Models , 1987 .

[28]  Michael Evans,et al.  Measuring statistical evidence using relative belief , 2015, Computational and structural biotechnology journal.

[29]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[30]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[31]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[32]  Chuanhai Liu,et al.  Frameworks for prior‐free posterior probabilistic inference , 2014, 1407.8225.

[33]  Ryan Martin A Statistical Inference Course Based on p-Values , 2016 .

[34]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[35]  Chuanhai Liu,et al.  Optimal inferential models for a Poisson mean , 2012, 1207.0105.

[36]  Brian A. Nosek,et al.  Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 , 2018, Nature Human Behaviour.

[37]  Andy K. L. Chiang A Simple General Method for Constructing Confidence Intervals for Functions of Variance Components , 2001, Technometrics.

[38]  Glenn Shafer,et al.  Allocations of Probability , 1979, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[39]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[40]  George A. Barnard,et al.  Pivotal models and the fiducial argument , 1995 .

[41]  Arthur P. Dempster,et al.  Statistical inference from a Dempster–Shafer perspective , 2014 .

[42]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[43]  J. Kadane Principles of Uncertainty , 2011 .

[44]  Xiao-Li Meng,et al.  Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss and Simpson’s Paradox , 2017, Statistical Science.

[45]  Chuanhai Liu,et al.  Inference about constrained parameters using the elastic belief method , 2012, Int. J. Approx. Reason..

[46]  Charles Stein,et al.  An Example of Wide Discrepancy Between Fiducial and Confidence Intervals , 1959 .

[47]  A. Dempster Further Examples of Inconsistencies in the Fiducial Argument , 1963 .

[48]  Iain Carmichael,et al.  An exposition of the false confidence theorem , 2018, 1807.06217.

[49]  K. Krishnamoorthy,et al.  A More Powerful Test for Comparing Two Poisson Means , 2002 .

[50]  H. Scheffé Practical Solutions of the Behrens-Fisher Problem , 1970 .

[51]  L. Wasserman,et al.  Bayes' Theorem for Choquet Capacities , 1990 .

[52]  Gideon Nave,et al.  Evaluating replicability of laboratory experiments in economics , 2016, Science.

[53]  Jürg Kohlas,et al.  A Mathematical Theory of Hints , 1995 .

[54]  A. Birnbaum On the Foundations of Statistical Inference , 1962 .

[55]  Teddy Seidenfeld R. A. Fisher's Fiducial Argument and Bayes' Theorem , 1992 .

[56]  Thierry Denoeux,et al.  Frequency-calibrated belief functions: Review and new insights , 2018, Int. J. Approx. Reason..

[57]  Grace Y. Yi,et al.  Default priors for Bayesian and frequentist inference , 2010 .

[58]  Thierry Denoeux,et al.  Decision-Making with Belief Functions: a Review , 2018, Int. J. Approx. Reason..

[59]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[60]  Marcelo J. Moreira,et al.  Impossible inference in econometrics: Theory and applications , 2016, Journal of Econometrics.

[61]  P. McCullagh What is a statistical model , 2002 .

[62]  Ryan Martin A mathematical characterization of confidence as valid belief , 2017, 1707.00486.

[63]  Y. Ritov,et al.  Response to the ASA’s Statement on p-Values: Context, Process, and Purpose , 2017 .

[64]  Jan Hannig,et al.  Generalized Fiducial Inference for Ultrahigh-Dimensional Regression , 2013, 1304.7847.

[65]  Bradley Efron,et al.  R.A. Fisher In The 21St Century , 1997 .

[66]  Ryan Martin,et al.  Prior-Free Probabilistic Prediction of Future Observations , 2014, Technometrics.

[67]  Hung T. Nguyen,et al.  An Introduction to Random Sets , 2006 .

[68]  Harry Crane,et al.  Probabilistic Foundations of Statistical Network Analysis , 2018 .

[69]  Jean-Marie Dufour,et al.  Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models , 1997 .

[70]  K. Singh,et al.  Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review , 2013 .

[71]  Yi Lin,et al.  Exact prior-free probabilistic inference in a class of non-regular models , 2016 .

[72]  Leonard J. Savage,et al.  On Rereading R. A. Fisher , 1976 .

[73]  Chris Woolston Psychology journal bans P values , 2015, Nature.

[74]  David R. Cox,et al.  On Some Principles of Statistical Inference , 2015 .

[75]  Hua Jin,et al.  The IM-based method for testing the non-inferiority of odds ratio in matched-pairs design , 2016 .

[76]  Ryan Martin,et al.  Conditional inferential models: combining information for prior‐free probabilistic inference , 2012, 1211.1530.

[77]  Rory A. Fisher,et al.  Statistical methods and scientific inference. , 1957 .

[79]  J. Ioannidis Why Most Published Research Findings Are False , 2019, CHANCE.

[80]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[81]  S. Zabell R. A. Fisher on the history of inverse probability , 1989 .

[82]  Larry Wasserman,et al.  Prior Envelopes Based on Belief Functions , 1990 .

[83]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[84]  S. L. Zabell,et al.  R. A. Fisher and Fiducial Argument , 1992 .

[85]  Arthur P. Dempster,et al.  The Dempster-Shafer calculus for statisticians , 2008, Int. J. Approx. Reason..

[86]  G. Shafer From Cournot’s Principle to Market Efficiency , 2007 .

[87]  Harry Crane The Impact of P-hacking on “Redefine Statistical Significance” , 2018, Basic and Applied Social Psychology.

[88]  P. J. Huber,et al.  Minimax Tests and the Neyman-Pearson Lemma for Capacities , 1973 .

[89]  N. Hjort,et al.  Confidence and Likelihood * , 2002 .

[90]  Stephen G. Walker,et al.  Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector , 2013, 1304.7366.

[91]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[92]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[93]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[94]  Allan S. Cohen,et al.  On the Behrens-Fisher Problem: A Review , 1998 .

[95]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[96]  Jan Hannig Generalized fiducial inference via discretization , 2013 .

[97]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[98]  Arthur P. Dempster,et al.  New Methods for Reasoning Towards PosteriorDistributions Based on Sample Data , 1966, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[99]  Peter Walley,et al.  Towards a unified theory of imprecise probability , 2000, Int. J. Approx. Reason..

[100]  H. Iyer,et al.  Fiducial Generalized Confidence Intervals , 2006 .

[101]  Samaradasa Weerahandi,et al.  Generalized Confidence Intervals , 1993 .

[102]  Bradley Efron,et al.  R. A. Fisher in the 21st century (Invited paper presented at the 1996 R. A. Fisher Lecture) , 1998 .

[103]  H. Jeffreys,et al.  Theory of probability , 1896 .

[104]  Bruno de Finetti,et al.  Probability, induction and statistics , 1972 .

[105]  Hung T. Nguyen,et al.  Manipulating the Alpha Level Cannot Cure Significance Testing , 2017, Front. Psychol..

[106]  Bo Henry Lindqvist,et al.  Fiducial on a string , 2017, 1706.03805.

[107]  Jan Hannig,et al.  Generalized fiducial inference for wavelet regression , 2009 .

[108]  A. Dempster On the Difficulties Inherent in Fisher's Fiducial Argument , 1964 .

[109]  Ryan Martin,et al.  A note on p-values interpreted as plausibilities , 2012, 1211.1547.

[110]  Chuanhai Liu,et al.  Inferential Models: A Framework for Prior-Free Posterior Probabilistic Inference , 2012, 1206.4091.

[111]  A. Dempster Upper and Lower Probabilities Generated by a Random Closed Interval , 1968 .

[112]  D. Cox,et al.  Frequentist statistics as a theory of inductive inference , 2006, math/0610846.

[113]  Ryan Martin On an inferential model construction using generalized associations , 2015, 1511.06733.

[114]  Ryan Martin,et al.  On Valid Uncertainty Quantification About a Model , 2019, ISIPTA.

[115]  Chuanhai Liu,et al.  Exact and efficient inference for Partial Bayes problems , 2018, 1802.04050.

[116]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[117]  Ryan Martin,et al.  Robust and rate-optimal Gibbs posterior inference on the boundary of a noisy image , 2016, The Annals of Statistics.

[118]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[119]  A. Polyanin,et al.  Handbook of First-Order Partial Differential Equations , 2001 .

[120]  Rudolf Carnap,et al.  Logical foundations of probability , 1951 .

[121]  B. D. Finetti La prévision : ses lois logiques, ses sources subjectives , 1937 .

[122]  Hung T. Nguyen,et al.  On Random Sets and Belief Functions , 1978, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[123]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[124]  Scott Ferson,et al.  Satellite conjunction analysis and the false confidence theorem , 2017, Proceedings of the Royal Society A.

[125]  Bikas K. Sinha,et al.  Design and Inference in Finite Population Sampling , 1991 .

[126]  D. Fraser,et al.  Three enigmatic examples and inference from likelihood , 2009 .

[127]  Deborah G. Mayo,et al.  Error and the Growth of Experimental Knowledge , 1996 .

[128]  Leonard J. Savage,et al.  The Foundations of Statistics Reconsidered , 1961 .

[129]  I. Molchanov Theory of Random Sets , 2005 .

[130]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[131]  Imre Lakatos,et al.  The methodology of scientific research programmes: Contents , 1978 .

[132]  D. A. S. Fraser Is Bayes Posterior just Quick and Dirty Confidence , 2011 .

[133]  R. Wolpert,et al.  Integrated likelihood methods for eliminating nuisance parameters , 1999 .

[134]  Ryan Martin,et al.  Calibrating general posterior credible regions , 2015, Biometrika.

[135]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[136]  Malay Ghosh,et al.  The behrens‐fisher problem revisited: A bayes‐frequentist synthesis , 2001 .

[137]  Ryan Martin,et al.  Exact prior-free probabilistic inference on the heritability coefficient in a linear mixed model , 2014, 1406.3521.

[138]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[139]  Jan Hannig,et al.  ON GENERALIZED FIDUCIAL INFERENCE , 2009 .

[140]  William H. Woodall,et al.  Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban , 2019, The American Statistician.

[141]  D. Fraser The Structure of Inference. , 1969 .

[142]  Michael Scott Balch,et al.  Mathematical foundations for a theory of confidence structures , 2012, Int. J. Approx. Reason..

[143]  G. Choquet Theory of capacities , 1954 .

[144]  A. Dempster UPPER AND LOWER PROBABILITY INFERENCES FOR FAMILIES OF HYPOTHESES WITH MONOTONE DENSITY RATIOS , 1969 .