Beyond statistical inference: A decision theory for science

Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory proposed here calculates the expected utility of an effect on the basis of (1) the probability of replicating it and (2) a utility function on its size. It takes significance tests—which place all value on the replicability of an effect and none on its magnitude—as a special case, one in which the cost of a false positive is revealed to be an order of magnitude greater than the value of a true positive. More realistic utility functions credit both replicability and effect size, integrating them for a single index of merit. The analysis incorporates opportunity cost and is consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria. An alternate formulation is functionally equivalent to the formal theory, transparent, and easy to compute.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  C. F. Bond,et al.  One Hundred Years of Social Psychology Quantitatively Described , 2003 .

[3]  Seymour Geisser,et al.  Introduction to Fisher (1922) On the Mathematical Foundations of Theoretical Statistics , 1992 .

[4]  The Problem With Bayes , 2006 .

[5]  Lawrence Sklar,et al.  Philosophical problems of statistical inference , 1981 .

[6]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[7]  L. Hedges Distribution Theory for Glass's Estimator of Effect size and Related Estimators , 1981 .

[8]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[9]  Jerzy Neyman,et al.  First course in probability and statistics , 1951 .

[10]  R. R. Macdonald Why Replication Probabilities Depend on Prior Probability Distributions , 2005, Psychological science.

[11]  R. Luce Utility of Gains and Losses: Measurement-Theoretical and Experimental Approaches , 2000 .

[12]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[13]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[14]  Rex B. Kline,et al.  Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research , 2004 .

[15]  B. Frieden,et al.  Physics from Fisher Information: A Unification , 1998 .

[16]  Norman H. Anderson,et al.  Empirical Direction in Design and Analysis , 2001 .

[17]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[18]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[19]  John Beatty,et al.  The Empire of Chance: How Probability Changed Science and Everyday Life , 1989 .

[20]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[21]  E. Wagenmakers,et al.  A Bayesian Perspective on Hypothesis Testing , 2006, Psychological science.

[22]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[23]  P. Killeen,et al.  An Alternative to Null-Hypothesis Significance Tests , 2005, Psychological science.

[24]  A. Greenwald Consequences of Prejudice Against the Null Hypothesis , 1975 .

[25]  E. Jaynes Probability theory : the logic of science , 2003 .

[26]  Alfred O. Hero,et al.  Multicriteria Gene Screening for Analysis of Differential Expression with DNA Microarrays , 2004, EURASIP J. Adv. Signal Process..

[27]  I. J. Myung,et al.  Applying Occam’s razor in modeling cognition: A Bayesian approach , 1997 .

[28]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[29]  B. F. Skinner,et al.  A case history in scientific method. , 1956 .

[30]  G. Cumming,et al.  Editors Can Lead Researchers to Confidence Intervals, but Can't Make Them Think , 2004, Psychological science.

[31]  W. Hays Statistics for psychologists , 1963 .

[32]  M. Altman Statistical significance, path dependency, and the culture of journal publication , 2004 .

[33]  L. V. Jones,et al.  A sensible formulation of the significance test. , 2000, Psychological methods.

[34]  H. Bandemer Savage, L. J.: Foundations of Statistics. Dover Publ., Inc,. New York 1972. 310 S. , 1974 .

[35]  R. Rosenthal Parametric measures of effect size. , 1994 .

[36]  Luis V. García,et al.  Escaping the Bonferroni iron claw in ecological studies , 2004 .

[37]  M. Lee,et al.  Bayesian statistical inference in psychology: comment on Trafimow (2003). , 2005, Psychological review.

[38]  G. Doros,et al.  Probability of Replication Revisited , 2005, Psychological science.

[39]  John W. Tukey,et al.  Controlling Error in Multiple Comparisons, with Examples from State-to-State Differences in Educational Achievement , 1999 .

[40]  Rory A. Fisher,et al.  Statistical methods and scientific inference. , 1957 .

[41]  Mark W. Lipsey,et al.  Practical Meta-Analysis , 2000 .

[42]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[43]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[44]  Ronald C. Serlin,et al.  Rationality in psychological research: The good-enough principle. , 1985 .

[45]  N. L. Johnson,et al.  Breakthroughs in Statistics, Vol. III , 1998 .

[46]  Stephen T. Ziliak,et al.  Size Matters: The Standard Error of Regressions in the American Economic Review , 2004 .

[47]  P. Killeen Replicability, Confidence, and Priors , 2005, Psychological science.

[48]  B. Frieden,et al.  Physics from Fisher Information by B. Roy Frieden , 1998 .

[49]  C. Lunneborg Data Analysis by Resampling: Concepts and Applications , 1999 .

[50]  M. Sidman Tactics of Scientific Research , 1960 .

[51]  Teddy Seidenfeld Philosophical Problems of Statistical Inference: Learning from R.A. Fisher , 1979 .

[52]  Hossein Nouri,et al.  Effect Size for ANOVA Designs , 1999 .

[53]  Roger E. Kirk,et al.  Statistics: An Introduction , 1998 .

[54]  P. Meehl Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .

[55]  Gerd Gigerenzer,et al.  The superego, the ego, and the id in statistical reasoning , 1993 .

[56]  A. Greenwald,et al.  Effect sizes and p values: what should be reported and what should be replicated? , 1996, Psychophysiology.

[57]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[58]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .