Misprescription and misuse of one‐tailed tests

One-tailed statistical tests are often used in ecology, animal behaviour and in most other fields in the biologicalandsocialsciences.Herewereviewthefrequencyoftheiruseinthe1989and2005volumesoftwojournals (Animal Behaviour and Oecologia), their advantages and disadvantages, the extensive erroneous advice on them in both older and modern statistics texts and their utility in certain narrow areas of applied research. Of those articles with data sets susceptible to one-tailed tests, at least 24% in Animal Behaviour and at least 13% in Oecologia used one-tailedtestsatleastonce.Theywereused35%morefrequentlywithnonparametricmethodsthanwithparametric ones and about twice as often in 1989 as in 2005.Debate in the psychological literature of the 1950s established the logical criterion that one-tailed tests should be restricted to situations where there is interest only in results in one direction. 'Interest' should be defined; however, in terms of collective or societal interest and not by the individual investigator. By this 'collective interest' criterion, all uses of one-tailed tests in the journals surveyed seem invalid. In his book Nonparametric Statistics, S. Siegel unrelentingly suggested the use of one-tailed tests whenever the investigator predicts the direction of a result.That work has been a major proximate source of confusion on this issue, but so are most recent statistics textbooks.The utility of one-tailed tests in research aimed at obtaining regulatory approval of new drugs and new pesticides is briefly described, to exemplify the narrow range of research situations where such tests can be appropriate.These situations are characterized by null hypotheses stating that the difference or effect size does not exceed, or is at least as great as, some 'amount of practical interest'. One-tailed tests rarely should be used for basic or applied research in ecology, animal behaviour or any other science.

[1]  S. Lohr Statistics (2nd Ed.) , 1994 .

[2]  Jean D. Gibbons,et al.  P-values: Interpretation and Methodology , 1975 .

[3]  David B. Pillemer,et al.  One- Versus Two-Tailed Hypothesis Tests in Contemporary Educational Research , 1991 .

[4]  B. J. Winer Statistical Analysis in Psychology and Education. 3rd ed. , 1972 .

[5]  Paul Martin,et al.  Measuring behaviour: An introductory guide, 2nd ed. , 1993 .

[6]  M. J. Bayarri,et al.  Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing , 2003 .

[7]  N Thompson Hobbs,et al.  Alternatives to statistical hypothesis testing in ecology: a guide to self teaching. , 2006, Ecological applications : a publication of the Ecological Society of America.

[8]  A. J. Underwood,et al.  Experiments in ecology and management: Their logics, functions and interpretations , 1990 .

[9]  William G. Cochran,et al.  Experimental Designs, 2nd Edition , 1950 .

[10]  Jessica Gurevitch,et al.  Design and Analysis of Ecological Experiments , 1993 .

[11]  K. Peace The alternative hypothesis: one-sided or two-sided? , 1989, Journal of clinical epidemiology.

[12]  M. A. Best Bayesian Approaches to Clinical Trials and Health‐Care Evaluation , 2005 .

[13]  Les Leventhal,et al.  Directional decisions for two-tailed tests: Power, error rates, and sample size. , 1996 .

[14]  W. Tryon Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. , 2001, Psychological methods.

[15]  J. Hoekstra,et al.  The bounded effect concentration as an alternative to the NOEC , 1993 .

[16]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[17]  J. Neyman Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability , 1937 .

[18]  William I. Notz,et al.  Sampling and statistical methods for behavioral ecologists , 1998 .

[19]  H. Robinson Principles and Procedures of Statistics , 1961 .

[20]  Joan Welkowitz,et al.  Introductory Statistics for the Behavioral Sciences , 1971 .

[21]  J. Tukey The Philosophy of Multiple Comparisons , 1991 .

[22]  Robert V. Brill,et al.  Statistics in Plain English , 2003, Technometrics.

[23]  Deborah Rolka,et al.  Equivalence Testing for Binomial Random Variables , 2001 .

[24]  C. J. Burke A brief note on one-tailed tests. , 1953, Psychological bulletin.

[25]  Graham B. McBride,et al.  Applications: Equivalence Tests Can Enhance Environmental Science and Management , 1999 .

[26]  G. Casella,et al.  Reconciling Bayesian and Frequentist Evidence in the One-Sided Testing Problem , 1987 .

[27]  David S. Salsburg,et al.  The use of restricted significance tests in clinical trials , 1992 .

[28]  Steven D. Gaines,et al.  Analysis of Biological Data When there are Ordered Expectations , 1990, The American Naturalist.

[29]  W. Hays,et al.  Statistics (3rd ed.). , 1982 .

[30]  M. Goldfried One-tailed tests and unexpected results. , 1959, Psychological review.

[31]  R. Hertzberg,et al.  A new method for determining allowable daily intakes. , 1986, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[32]  J. Fleiss Letter to the editor: Some thoughts on two-tailed tests , 1987 .

[33]  R. Abelson Statistics As Principled Argument , 1995 .

[34]  Nicole A. Lazar,et al.  Testing Statistical Hypotheses of Equivalence , 2003, Technometrics.

[35]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[36]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[37]  Mike Hansell,et al.  Measuring Behaviour: An Introductory Guide, Paul Martin, Patrick Bateson. University of Cambridge Press, Cambridge (1986), x, +200. Price £20.00 (hardback), £6.95 (paperback) , 1987 .

[38]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[39]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[40]  David Gold,et al.  Statistical Methods for the Behavioral Sciences. , 1955 .

[41]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[42]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[43]  A note on directional inference. , 1967, Psychological bulletin.

[44]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[45]  H. Kaiser,et al.  Directional statistical decisions. , 1960, Psychological review.

[46]  J. Fleiss The design and analysis of clinical experiments , 1987 .

[47]  R. Royall Statistical Evidence: A Likelihood Paradigm , 1997 .

[48]  S. Pocock The pros and cons of non-inferiority (equivalence) trials , 2002 .

[49]  Mark Crane,et al.  What level of effect is a no observed effect? , 2000 .

[50]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[51]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[52]  Samuel T. Mayo,et al.  Statistical methods in education and psychology , 1979 .

[53]  Burke Cj A brief note on one-tailed tests. , 1953 .

[54]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[55]  Brian Dennis,et al.  STATISTICS AND THE SCIENTIFIC METHOD IN ECOLOGY , 2001 .

[56]  Celia M. Lombardi,et al.  Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian , 2009 .

[57]  L. M. M.-T. Theory of Probability , 1929, Nature.

[58]  L. V. Jones Tests of hypotheses: one-sided vs. two-sided alternatives. , 1952, Psychological bulletin.

[59]  S. Wellek Testing Statistical Hypotheses of Equivalence , 2002 .

[60]  K S Crump,et al.  A new method for determining allowable daily intakes. , 1984, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[61]  Directional statistical hypotheses and comparisons among means. , 1972 .

[62]  J. I The Design of Experiments , 1936, Nature.

[63]  P. Meehl Theory-Testing in Psychology and Physics: A Methodological Paradox , 1967, Philosophy of Science.

[64]  K E Peace The alternative hypothesis: one-sided or two-sided? , 1989, Journal of clinical epidemiology.

[65]  Sarfaraz Niazi,et al.  Bioavailability and Bioequivalence Studies for Orally Administered Drug Products , 2004 .

[66]  Stuart H. Hurlbert,et al.  EXPERIMENTS WITH FRESHWATER INVERTEBRATE ZOOPLANKTIVORES: QUALITY OF STATISTICAL ANALYSES , 1993 .

[67]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[68]  Fred A. Johnson,et al.  BAYESIAN INFERENCE AND DECISION THEORY—A FRAMEWORK FOR DECISION MAKING IN NATURAL RESOURCE MANAGEMENT , 2003 .

[69]  K E Peace,et al.  One-sided or two-sided p values: which most appropriately address the question of drug efficacy? , 1991, Journal of biopharmaceutical statistics.

[70]  J. Hand,et al.  The Procedures and Justification of a Two-Tailed Directional Test of Significance , 1985 .

[71]  A Hartz,et al.  The inexact use of Fisher's Exact Test in six major medical journals. , 1989, JAMA.

[72]  D. Bakan,et al.  The test of significance in psychological research. , 1966, Psychological bulletin.

[73]  Classical Statistical Inference: Practice Versus Presentation , 2005 .

[74]  Ronald Christensen,et al.  Testing Fisher, Neyman, Pearson, and Bayes , 2005 .

[75]  R J Tempelman,et al.  Experimental design and statistical methods for classical and bioequivalence hypothesis testing with an application to dairy nutrition studies. , 2004, Journal of animal science.

[76]  Michael J. Crawley,et al.  GLIM for Ecologists , 1994 .

[77]  E. A. Catchpole,et al.  Sexual dimorphism, survival and dispersal in red deer , 2004 .

[78]  J E Overall,et al.  A comment concerning one-sided tests of significance in new drug applications. , 1991, Journal of biopharmaceutical statistics.

[79]  N. Balluerka,et al.  The Controversy over Null Hypothesis Significance Testing Revisited , 2005 .

[80]  M. Oakes Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .

[81]  H. Eysenck,et al.  The concept of statistical significance and the controversy about one-tailed tests. , 1960, Psychological review.

[82]  W R Rice,et al.  Extending nondirectional heterogeneity tests to evaluate simply ordered alternative hypotheses. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[83]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[84]  R. Greenberg Biometry , 1969, The Yale Journal of Biology and Medicine.

[85]  Robert L. Mason,et al.  Statistical Principles in Experimental Design , 2003 .

[86]  M. Marks Two kinds of experiment distinguished in terms of statistical operations. , 1951, Psychological review.

[87]  G. W. Snedecor STATISTICAL METHODS , 1967 .

[88]  Les Leventhal,et al.  Analyzing Listening Tests with the Directional Two-Tailed Test , 1996 .

[89]  A R Feinstein,et al.  XXV. A survey of the statistical procedures in general medical journals , 1974, Clinical pharmacology and therapeutics.

[90]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[91]  D. Schluter,et al.  The Analysis of Biological Data , 2008 .

[92]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[93]  Null Hypothesis Significance Testing , 2010 .

[94]  D. Spiegelhalter,et al.  Disease Mapping With WinBUGS and MLwiN, Bayesian Approaches to Clinical Trials and Health Care Evaluation , 2004 .

[95]  D. Helsel,et al.  Statistical methods in water resources , 2020, Techniques and Methods.

[96]  C. Morris Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment , 1987 .

[97]  J. L. Hodges,et al.  Testing the Approximate Validity of Statistical Hypotheses , 1954 .

[98]  R. L. Hagen In praise of the null hypothesis statistical test. , 1997 .

[99]  Aris Spanos,et al.  Probability theory and statistical inference: econometric modelling with observational data , 1999 .

[100]  R. K. Young,et al.  Introductory statistics for the behavioral sciences , 1966 .

[101]  Quinn McNemar,et al.  Statistical Analysis in Psychology and Education. , 1967 .

[102]  Tim Urdan,et al.  Statistics in Plain English , 2001 .

[103]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[104]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[105]  G. Upton Fisher's Exact Test , 1992 .

[106]  Joan Welkowitz Introductory statistics for the behavioral sciences / Joan Welkowitz, Barry H. Cohen, Robert B. Ewen , 2006 .

[107]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[108]  J. Hoekstra,et al.  Alternatives for the no‐observed‐effect level , 1993 .

[109]  One-sided tests of bioequivalencewith nonnormal distributions and unequal variances , 2004 .

[110]  J. Tamayo-Sarver,et al.  Advanced statistics: how to determine whether your intervention is different, at least as effective as, or equivalent: a basic introduction. , 2005, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[111]  Geert Dhaene,et al.  Probability Theory and Statistical Inference: Econometric Modeling With Observational Data , 2001 .

[112]  M. E. Terry,et al.  Statistical Methods for the Behavioral Sciences. , 1955 .

[113]  Marc E. Lippman Instructions for Authors , 2004, Breast Cancer Research and Treatment.

[114]  Quinn McNemar,et al.  Psychological statistics, 2nd ed. , 1955 .

[115]  E. Wagenmakers,et al.  A Bayesian Perspective on Hypothesis Testing , 2006, Psychological science.

[116]  Sanford L. Braver,et al.  On Splitting the Tails Unequally: a New Perspective on One-Versus Two-Tailed Tests , 1975 .

[117]  L. Harlow,et al.  What if there were no significance tests , 1997 .

[118]  H. Kimmel,et al.  Three criteria for the use of one-tailed tests. , 1957, Psychological bulletin.

[119]  Jerzy Neyman,et al.  First course in probability and statistics , 1951 .

[120]  A. J. Underwood,et al.  Experiments in Ecology. , 1997 .

[121]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[122]  R. D'Agostino,et al.  Non‐inferiority trials: design concepts and issues – the encounters of academic consultants in statistics , 2002, Statistics in medicine.