Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction

Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. 1. Introduction and overview 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm2. Error statistical tests from the severity perspective 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values3. Neyman's post-data use of power 3.1Neyman: does failure to reject H warrant confirming H?4. Severe testing as a basic concept for an adequate post-data inference 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power5. Fallacy of rejection: statistical vs. substantive significance 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept6. The severe testing concept and confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals7. Beyond the N–P paradigm: pure significance, and misspecification tests8. Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction? Introduction and overview 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm 1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm Error statistical tests from the severity perspective 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values 2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values Neyman's post-data use of power 3.1Neyman: does failure to reject H warrant confirming H? 3.1Neyman: does failure to reject H warrant confirming H? Severe testing as a basic concept for an adequate post-data inference 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power 4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power Fallacy of rejection: statistical vs. substantive significance 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept 5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept The severe testing concept and confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals 6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals Beyond the N–P paradigm: pure significance, and misspecification tests Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?

[1]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[2]  Roger D. Rosenkrantz,et al.  Inference, Method and Decision: Towards a Bayesian Philosophy of Science , 2008 .

[3]  D. Cox,et al.  Frequentist statistics as a theory of inductive inference , 2006, math/0610846.

[4]  Aris Spanos,et al.  Revisiting the omitted variables argument: Substantive vs. statistical adequacy , 2006 .

[5]  S. Sarkar,et al.  The Philosophy of Science: An Encyclopedia , 2006 .

[6]  D. Mayo CRITICAL RATIONALISM AND ITS FAILURE TO WITHSTAND CRITICAL SCRUTINY , 2006 .

[7]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[8]  Deborah G. Mayo An objective theory of statistical testing , 2005, Synthese.

[9]  Diderik Batens,et al.  On a Logic of Induction , 2005 .

[10]  Deborah G. Mayo,et al.  Methodology in Practice: Statistical Misspecification Testing , 2004, Philosophy of Science.

[11]  D. Mayo Did Pearson reject the Neyman-Pearson philosophy of statistics? , 1992, Synthese.

[12]  L. Lecam A note on metastatistics or ‘an essay toward stating a problem in the doctrine of chances’ , 1977, Synthese.

[13]  J. Neyman,et al.  Frequentist probability and frequentist statistics , 1977, Synthese.

[14]  Allan Birnbaum,et al.  The Neyman-Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley-savage argument for Bayesian theory , 1977, Synthese.

[15]  Deborah G. Mayo An Error-Statistical Philosophy of Evidence , 2004 .

[16]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[17]  Geert Dhaene,et al.  Probability Theory and Statistical Inference: Econometric Modeling With Observational Data , 2001 .

[18]  Michael Kruse,et al.  Principles of Inference and Their Consequences , 2001 .

[19]  Peter Achinstein,et al.  The Book of Evidence , 2001 .

[20]  Martin Carrier,et al.  Science at Century's End Philosophical Questions on the Progress and Limits of Science , 2000 .

[21]  Aris Spanos,et al.  Probability theory and statistical inference: econometric modelling with observational data , 1999 .

[22]  A. Musgrave Essays on Realism and Rationalism , 1999 .

[23]  R. Tweney Error and the growth of experimental knowledge , 1998 .

[24]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[25]  Karl Raimund Sir Popper,et al.  The Myth of the Framework : In Defence of Science and Rationality , 1997 .

[26]  L. Harlow,et al.  What if there were no significance tests , 1997 .

[27]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[28]  Deborah G. Mayo,et al.  Error and the Growth of Experimental Knowledge , 1996 .

[29]  B. Thompson Research news and Comment: AERA Editorial Policies Regarding Statistical Significance Testing: Three Suggested Reforms , 1996 .

[30]  J. Berger Discussion of David Freedman’s “Some Issues in the Foundations of Statistics” , 1995 .

[31]  David A. Freedman,et al.  SOME ISSUES IN THE FOUNDATION OF STATISTICS , 1995 .

[32]  L. G. Neuberg,et al.  Bayes or Bust?-A Critical Examination of Bayesian Confirmation Theory. , 1994 .

[33]  Karl R. Popper,et al.  The Myth of the Framework , 1994 .

[34]  Rachelle D. Hollander,et al.  Acceptable Evidence: Science and Values in Risk Management , 1994 .

[35]  R. Rosenthal Parametric measures of effect size. , 1994 .

[36]  E. Lehmann The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? , 1993 .

[37]  Gerd Gigerenzer,et al.  The superego, the ego, and the id in statistical reasoning , 1993 .

[38]  Robert E. Kass,et al.  Formal rules for selecting prior distributions: A review and annotated bibliography , 1993 .

[39]  Deborah G. Mayo,et al.  Novel Evidence and Severe Tests , 1991, Philosophy of Science.

[40]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[41]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[42]  C. Howson,et al.  Scientific Reasoning: The Bayesian Approach , 1989 .

[43]  Richard Scheines,et al.  Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling , 1987 .

[44]  B. Efron Why Isn't Everyone a Bayesian? , 1986 .

[45]  D. Hendry,et al.  Statistical foundations of econometric modelling: The nature of statistical inference , 1986 .

[46]  Deborah G. Mayo,et al.  Behavioristic, Evidentialist, and Learning Models of Statistical Testing , 1985, Philosophy of Science.

[47]  Gudmund R. Iversen,et al.  Bayesian statistical inference , 1984 .

[48]  H. Kyburg,et al.  The Enterprise of Knowledge, An Essay on Knowledge, Credal Probability, and Chances. , 1983 .

[49]  J. Hartigan Theories of Probability , 1983 .

[50]  Karl Raimund Sir Popper,et al.  Realism and the aim of science , 1983 .

[51]  G. Shafer The Enterprise of Knowledge: An Essay on Knowledge, Credal Probability, and Chance , 1982 .

[52]  James H. Fetzer Scientific Knowledge: Causation, Explanation, and Corroboration , 1981 .

[53]  Lawrence Sklar,et al.  Philosophical problems of statistical inference , 1981 .

[54]  Edwin T. Jaynes,et al.  Inference, Method, and Decision: Towards a Bayesian Philosophy of Science. , 1979 .

[55]  Teddy Seidenfeld Philosophical Problems of Statistical Inference: Learning from R.A. Fisher , 1979 .

[56]  B. D. Finetti,et al.  Probability, induction and statistics : the art of guessing , 1979 .

[57]  Ronald N. Giere,et al.  Understanding Scientific Reasoning , 1979 .

[58]  B. Dahn Foundations of Probability theory, statistical inference, and statistical theories of science , 1978 .

[59]  Imre Lakatos,et al.  The methodology of scientific research programmes: Contents , 1978 .

[60]  J. Kiefer Conditional Confidence Statements and Confidence Estimators , 1977 .

[61]  O. Kempthorne Statistics and the Philosophers , 1976 .

[62]  C. Hooker,et al.  Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science , 1976 .

[63]  J. Pratt A discussion of the question: for what use are tests of hypotheses and tests of significance , 1976 .

[64]  Ronald N. Giere,et al.  Empirical Probability, Objective Statistical Methods, and Scientific Inquiry , 1976 .

[65]  J. Neyman Tests of statistical hypotheses and their use in studies of natural phenomena , 1976 .

[66]  Jean D. Gibbons,et al.  P-values: Interpretation and Methodology , 1975 .

[67]  D. A. Sprott,et al.  Foundations of Statistical Inference, A Symposium. , 1974 .

[68]  Henry Ely Kyburg,et al.  The logical foundations of statistical inference , 1974 .

[69]  B. Sauphanor The logical foundations of statistical inference , 1974 .

[70]  Donald Gillies,et al.  An Objective Theory of Probability , 1976 .

[71]  Stephen Spielman A Refutation of the Neyman-Pearson Theory of Testing , 1973, The British Journal for the Philosophy of Science.

[72]  Geoffrey Gregory,et al.  Foundations of Statistical Inference , 1973 .

[73]  Warren J. Ewens,et al.  Likelihood: An account of the statistical concept of likelihood and its application to scientific inference. , 1973 .

[74]  A. W. F. Edwards,et al.  Statistical Inference. (Book Reviews: Likelihood. An Account of the Statistical Concept of Likelihood and Its Application to Scientific Inference) , 1973 .

[75]  F. Downton,et al.  PROBABILITY AND STATISTICS , 1972 .

[76]  Denton E. Morrison,et al.  The Significance Test Controversy , 1972 .

[77]  Bruno de Finetti,et al.  Probability, induction and statistics , 1972 .

[78]  H. D. De Kanter [The philosophy of statistics]. , 1972, Ginecología y Obstetricia de México.

[79]  Oscar Kempthorne,et al.  Probability, Statistics, and data analysis , 1973 .

[80]  N. L. Johnson The Selected Papers of E. S. Pearson, Cambridge University Press (for the Biometrika Trustees) 1966 , 1968 .

[81]  P. Meehl Theory-Testing in Psychology and Physics: A Methodological Paradox , 1967, Philosophy of Science.

[82]  W. Salmon The foundations of scientific inference , 1967 .

[83]  Jerzy Neyman,et al.  Joint Statistical Papers , 1967 .

[84]  I. Hacking Logic of Statistical Inference , 1966 .

[85]  Robert Rosenthal,et al.  The Interpretation of Levels of Significance by Psychological Researchers , 1963 .

[86]  E. S. Pearson Some Thoughts on Statistical Inference , 1962 .

[87]  J. Millis,et al.  THE UNIVERSITY OF , 2000 .

[88]  Cedric A. B. Smith,et al.  Consistency in Statistical Inference and Decision , 1961 .

[89]  Leonard J. Savage,et al.  The Foundations of Statistics Reconsidered , 1961 .

[90]  Walter L. Smith Probability and Statistics , 1959, Nature.

[91]  D. Cox Some problems connected with statistical inference , 1958 .

[92]  D. Lindley A STATISTICAL PARADOX , 1957 .

[93]  Jerzy Neyman,et al.  "Inductive Behavior" as a Basic Concept of Philosophy of Science , 1957 .

[94]  M. S. Bartlett,et al.  Statistical methods and scientific inference. , 1957 .

[95]  Jerzy Neyman,et al.  Note on an Article by Sir Ronald Fisher , 1956 .

[96]  M. Kendall Theoretical Statistics , 1956, Nature.

[97]  Egon S. Pearson,et al.  Statistical Concepts in Their Relation to Reality , 1955 .

[98]  Jerzy Neyman,et al.  The problem of inductive inference , 1955 .

[99]  R. Fisher,et al.  STATISTICAL METHODS AND SCIENTIFIC INDUCTION , 1955 .

[100]  P. Whittle,et al.  Lectures and conferences on mathematical statistics and probability , 1952 .

[101]  R. Carnap Logical foundations of probability , 1951 .

[102]  E. S. Pearson,et al.  On questions raised by the combination of tests based on discontinuous distributions. , 1950, Biometrika.

[103]  E. S. Pearson,et al.  The choice of statistical tests illustrated on the interpretation of data classed in a 2 X 2 table. , 1947, Biometrika.

[104]  J. Neyman Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability , 1937 .

[105]  T. E. Sterne,et al.  Inverse Probability , 1930, Nature.

[106]  R. Fisher,et al.  The Logic of Inductive Inference , 1935 .

[107]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[108]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[109]  L. M. M.-T. Theory of Probability , 1929, Nature.

[110]  Charles S. Peirce,et al.  A theory of probable inference. , 1883 .