论文信息 - Misprescription and misuse of one‐tailed tests

Misprescription and misuse of one‐tailed tests

One-tailed statistical tests are often used in ecology, animal behaviour and in most other fields in the biologicalandsocialsciences.Herewereviewthefrequencyoftheiruseinthe1989and2005volumesoftwojournals (Animal Behaviour and Oecologia), their advantages and disadvantages, the extensive erroneous advice on them in both older and modern statistics texts and their utility in certain narrow areas of applied research. Of those articles with data sets susceptible to one-tailed tests, at least 24% in Animal Behaviour and at least 13% in Oecologia used one-tailedtestsatleastonce.Theywereused35%morefrequentlywithnonparametricmethodsthanwithparametric ones and about twice as often in 1989 as in 2005.Debate in the psychological literature of the 1950s established the logical criterion that one-tailed tests should be restricted to situations where there is interest only in results in one direction. 'Interest' should be defined; however, in terms of collective or societal interest and not by the individual investigator. By this 'collective interest' criterion, all uses of one-tailed tests in the journals surveyed seem invalid. In his book Nonparametric Statistics, S. Siegel unrelentingly suggested the use of one-tailed tests whenever the investigator predicts the direction of a result.That work has been a major proximate source of confusion on this issue, but so are most recent statistics textbooks.The utility of one-tailed tests in research aimed at obtaining regulatory approval of new drugs and new pesticides is briefly described, to exemplify the narrow range of research situations where such tests can be appropriate.These situations are characterized by null hypotheses stating that the difference or effect size does not exceed, or is at least as great as, some 'amount of practical interest'. One-tailed tests rarely should be used for basic or applied research in ecology, animal behaviour or any other science.

Celia M. Lombardi | Stuart H. Hurlbert | S. Hurlbert

[1] S. Lohr. Statistics (2nd Ed.) , 1994 .

[2] Jean D. Gibbons,et al. P-values: Interpretation and Methodology , 1975 .

[3] David B. Pillemer,et al. One- Versus Two-Tailed Hypothesis Tests in Contemporary Educational Research , 1991 .

[4] B. J. Winer. Statistical Analysis in Psychology and Education. 3rd ed. , 1972 .

[5] Paul Martin,et al. Measuring behaviour: An introductory guide, 2nd ed. , 1993 .

[6] M. J. Bayarri,et al. Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing , 2003 .

[7] N Thompson Hobbs,et al. Alternatives to statistical hypothesis testing in ecology: a guide to self teaching. , 2006, Ecological applications : a publication of the Ecological Society of America.

[8] A. J. Underwood,et al. Experiments in ecology and management: Their logics, functions and interpretations , 1990 .

[9] William G. Cochran,et al. Experimental Designs, 2nd Edition , 1950 .

[10] Jessica Gurevitch,et al. Design and Analysis of Ecological Experiments , 1993 .

[11] K. Peace. The alternative hypothesis: one-sided or two-sided? , 1989, Journal of clinical epidemiology.

[12] M. A. Best. Bayesian Approaches to Clinical Trials and Health‐Care Evaluation , 2005 .

[13] Les Leventhal,et al. Directional decisions for two-tailed tests: Power, error rates, and sample size. , 1996 .

[14] W. Tryon. Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. , 2001, Psychological methods.

[15] J. Hoekstra,et al. The bounded effect concentration as an alternative to the NOEC , 1993 .

[16] Jacob Cohen. Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[17] J. Neyman. Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability , 1937 .

[18] William I. Notz,et al. Sampling and statistical methods for behavioral ecologists , 1998 .

[19] H. Robinson. Principles and Procedures of Statistics , 1961 .

[20] Joan Welkowitz,et al. Introductory Statistics for the Behavioral Sciences , 1971 .

[21] J. Tukey. The Philosophy of Multiple Comparisons , 1991 .

[22] Robert V. Brill,et al. Statistics in Plain English , 2003, Technometrics.

[23] Deborah Rolka,et al. Equivalence Testing for Binomial Random Variables , 2001 .

[24] C. J. Burke. A brief note on one-tailed tests. , 1953, Psychological bulletin.

[25] Graham B. McBride,et al. Applications: Equivalence Tests Can Enhance Environmental Science and Management , 1999 .

[26] G. Casella,et al. Reconciling Bayesian and Frequentist Evidence in the One-Sided Testing Problem , 1987 .

[27] David S. Salsburg,et al. The use of restricted significance tests in clinical trials , 1992 .

[28] Steven D. Gaines,et al. Analysis of Biological Data When there are Ordered Expectations , 1990, The American Naturalist.

[29] W. Hays,et al. Statistics (3rd ed.). , 1982 .

[30] M. Goldfried. One-tailed tests and unexpected results. , 1959, Psychological review.

[31] R. Hertzberg,et al. A new method for determining allowable daily intakes. , 1986, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[32] J. Fleiss. Letter to the editor: Some thoughts on two-tailed tests , 1987 .

[33] R. Abelson. Statistics As Principled Argument , 1995 .

[34] Nicole A. Lazar,et al. Testing Statistical Hypotheses of Equivalence , 2003, Technometrics.

[35] J. Berger. Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[36] Douglas A. Wolfe,et al. Nonparametric Statistical Methods , 1973 .

[37] Mike Hansell,et al. Measuring Behaviour: An Introductory Guide, Paul Martin, Patrick Bateson. University of Cambridge Press, Cambridge (1986), x, +200. Price £20.00 (hardback), £6.95 (paperback) , 1987 .

[38] David J. Groggel,et al. Practical Nonparametric Statistics , 2000, Technometrics.

[39] J. Fleiss. Statistical methods for rates and proportions , 1974 .

[40] David Gold,et al. Statistical Methods for the Behavioral Sciences. , 1955 .

[41] David R. Cox,et al. PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[42] W. Grove. Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[43] A note on directional inference. , 1967, Psychological bulletin.

[44] Douglas G. Altman,et al. Practical statistics for medical research , 1990 .

[45] H. Kaiser,et al. Directional statistical decisions. , 1960, Psychological review.

[46] J. Fleiss. The design and analysis of clinical experiments , 1987 .

[47] R. Royall. Statistical Evidence: A Likelihood Paradigm , 1997 .

[48] S. Pocock. The pros and cons of non-inferiority (equivalence) trials , 2002 .

[49] Mark Crane,et al. What level of effect is a no observed effect? , 2000 .

[50] P. Lachenbruch. Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[51] B. Everitt,et al. Statistical methods for rates and proportions , 1973 .

[52] Samuel T. Mayo,et al. Statistical methods in education and psychology , 1979 .

[53] Burke Cj. A brief note on one-tailed tests. , 1953 .

[54] Aaron M. Ellison,et al. Bayesian inference in ecology , 2004 .

[55] Brian Dennis,et al. STATISTICS AND THE SCIENTIFIC METHOD IN ECOLOGY , 2001 .

[56] Celia M. Lombardi,et al. Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian , 2009 .

[57] L. M. M.-T.. Theory of Probability , 1929, Nature.

[58] L. V. Jones. Tests of hypotheses: one-sided vs. two-sided alternatives. , 1952, Psychological bulletin.

[59] S. Wellek. Testing Statistical Hypotheses of Equivalence , 2002 .

[60] K S Crump,et al. A new method for determining allowable daily intakes. , 1984, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[61] Directional statistical hypotheses and comparisons among means. , 1972 .

[62] J. I. The Design of Experiments , 1936, Nature.

[63] P. Meehl. Theory-Testing in Psychology and Physics: A Methodological Paradox , 1967, Philosophy of Science.

[64] K E Peace. The alternative hypothesis: one-sided or two-sided? , 1989, Journal of clinical epidemiology.

[65] Sarfaraz Niazi,et al. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products , 2004 .

[66] Stuart H. Hurlbert,et al. EXPERIMENTS WITH FRESHWATER INVERTEBRATE ZOOPLANKTIVORES: QUALITY OF STATISTICAL ANALYSES , 1993 .

[67] S. Siegel,et al. Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[68] Fred A. Johnson,et al. BAYESIAN INFERENCE AND DECISION THEORY—A FRAMEWORK FOR DECISION MAKING IN NATURAL RESOURCE MANAGEMENT , 2003 .

[69] K E Peace,et al. One-sided or two-sided p values: which most appropriately address the question of drug efficacy? , 1991, Journal of biopharmaceutical statistics.

[70] J. Hand,et al. The Procedures and Justification of a Two-Tailed Directional Test of Significance , 1985 .

[71] A Hartz,et al. The inexact use of Fisher's Exact Test in six major medical journals. , 1989, JAMA.

[72] D. Bakan,et al. The test of significance in psychological research. , 1966, Psychological bulletin.

[73] Classical Statistical Inference: Practice Versus Presentation , 2005 .

[74] Ronald Christensen,et al. Testing Fisher, Neyman, Pearson, and Bayes , 2005 .

[75] R J Tempelman,et al. Experimental design and statistical methods for classical and bioequivalence hypothesis testing with an application to dairy nutrition studies. , 2004, Journal of animal science.

[76] Michael J. Crawley,et al. GLIM for Ecologists , 1994 .

[77] E. A. Catchpole,et al. Sexual dimorphism, survival and dispersal in red deer , 2004 .

[78] J E Overall,et al. A comment concerning one-sided tests of significance in new drug applications. , 1991, Journal of biopharmaceutical statistics.

[79] N. Balluerka,et al. The Controversy over Null Hypothesis Significance Testing Revisited , 2005 .

[80] M. Oakes. Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .

[81] H. Eysenck,et al. The concept of statistical significance and the controversy about one-tailed tests. , 1960, Psychological review.

[82] W R Rice,et al. Extending nondirectional heterogeneity tests to evaluate simply ordered alternative hypotheses. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[83] B. J. Winer. Statistical Principles in Experimental Design , 1992 .

[84] R. Greenberg. Biometry , 1969, The Yale Journal of Biology and Medicine.

[85] Robert L. Mason,et al. Statistical Principles in Experimental Design , 2003 .

[86] M. Marks. Two kinds of experiment distinguished in terms of statistical operations. , 1951, Psychological review.

[87] G. W. Snedecor. STATISTICAL METHODS , 1967 .

[88] Les Leventhal,et al. Analyzing Listening Tests with the Directional Two-Tailed Test , 1996 .

[89] A R Feinstein,et al. XXV. A survey of the statistical procedures in general medical journals , 1974, Clinical pharmacology and therapeutics.

[90] R. Nickerson,et al. Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[91] D. Schluter,et al. The Analysis of Biological Data , 2008 .

[92] J. Berger,et al. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[93] Null Hypothesis Significance Testing , 2010 .

[94] D. Spiegelhalter,et al. Disease Mapping With WinBUGS and MLwiN, Bayesian Approaches to Clinical Trials and Health Care Evaluation , 2004 .

[95] D. Helsel,et al. Statistical methods in water resources , 2020, Techniques and Methods.

[96] C. Morris. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment , 1987 .

[97] J. L. Hodges,et al. Testing the Approximate Validity of Statistical Hypotheses , 1954 .

[98] R. L. Hagen. In praise of the null hypothesis statistical test. , 1997 .

[99] Aris Spanos,et al. Probability theory and statistical inference: econometric modelling with observational data , 1999 .

[100] R. K. Young,et al. Introductory statistics for the behavioral sciences , 1966 .

[101] Quinn McNemar,et al. Statistical Analysis in Psychology and Education. , 1967 .

[102] Tim Urdan,et al. Statistics in Plain English , 2001 .

[103] V. Vieland,et al. Statistical Evidence: A Likelihood Paradigm , 1998 .

[104] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[105] G. Upton. Fisher's Exact Test , 1992 .

[106] Joan Welkowitz. Introductory statistics for the behavioral sciences / Joan Welkowitz, Barry H. Cohen, Robert B. Ewen , 2006 .

[107] J. Fleiss,et al. Statistical methods for rates and proportions , 1973 .

[108] J. Hoekstra,et al. Alternatives for the no‐observed‐effect level , 1993 .

[109] One-sided tests of bioequivalencewith nonnormal distributions and unequal variances , 2004 .

[110] J. Tamayo-Sarver,et al. Advanced statistics: how to determine whether your intervention is different, at least as effective as, or equivalent: a basic introduction. , 2005, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[111] Geert Dhaene,et al. Probability Theory and Statistical Inference: Econometric Modeling With Observational Data , 2001 .

[112] M. E. Terry,et al. Statistical Methods for the Behavioral Sciences. , 1955 .

[113] Marc E. Lippman. Instructions for Authors , 2004, Breast Cancer Research and Treatment.

[114] Quinn McNemar,et al. Psychological statistics, 2nd ed. , 1955 .

[115] E. Wagenmakers,et al. A Bayesian Perspective on Hypothesis Testing , 2006, Psychological science.

[116] Sanford L. Braver,et al. On Splitting the Tails Unequally: a New Perspective on One-Versus Two-Tailed Tests , 1975 .

[117] L. Harlow,et al. What if there were no significance tests , 1997 .

[118] H. Kimmel,et al. Three criteria for the use of one-tailed tests. , 1957, Psychological bulletin.

[119] Jerzy Neyman,et al. First course in probability and statistics , 1951 .

[120] A. J. Underwood,et al. Experiments in Ecology. , 1997 .

[121] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[122] R. D'Agostino,et al. Non‐inferiority trials: design concepts and issues – the encounters of academic consultants in statistics , 2002, Statistics in medicine.