Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses

Scientists should be able to provide support for the absence of a meaningful effect. Currently, researchers often incorrectly conclude an effect is absent based a nonsignificant result. A widely recommended approach within a frequentist framework is to test for equivalence. In equivalence tests, such as the two one-sided tests (TOST) procedure discussed in this article, an upper and lower equivalence bound is specified based on the smallest effect size of interest. The TOST procedure can be used to statistically reject the presence of effects large enough to be considered worthwhile. This practical primer with accompanying spreadsheet and R package enables psychologists to easily perform equivalence tests (and power analyses) by setting equivalence bounds based on standardized effect sizes and provides recommendations to prespecify equivalence bounds. Extending your statistical tool kit with equivalence tests is an easy way to improve your statistical and theoretical inferences.

[1]  D. Lakens,et al.  Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test with Unequal Group Sizes , 2017 .

[2]  Z. Dienes How Bayes factors change scientific practice , 2016 .

[3]  R. Calin-Jageman,et al.  Direct and Conceptual Replications of Eskine (2013) , 2016 .

[4]  Samantha F. Anderson,et al.  There's more than one way to conduct a replication study: Beyond statistical significance. , 2016, Psychological methods.

[5]  Stephan Lewandowsky,et al.  The Peer Reviewers' Openness Initiative: incentivizing open research practices through peer review , 2016, Royal Society Open Science.

[6]  U. Simonsohn Small Telescopes , 2014, Psychological science.

[7]  S. Maxwell,et al.  Is psychology suffering from a replication crisis? What does "failure to replicate" really mean? , 2015, The American psychologist.

[8]  Zoltan Dienes,et al.  Using Bayes to get the most out of non-significant results , 2014, Front. Psychol..

[9]  D. Lakens Performing High-Powered Studies Efficiently with Sequential Analyses , 2014 .

[10]  Daniël Lakens,et al.  Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs , 2013, Front. Psychol..

[11]  K. Eskine Wholesome Foods and Wholesome Morals? , 2013 .

[12]  Michael Meyners,et al.  Equivalence tests – A review , 2012 .

[13]  R. Weber,et al.  Testing Equivalence in Communication Research: Theory and Application , 2012 .

[14]  E. Quertemont,et al.  How to Statistically Show the Absence of an Effect , 2011 .

[15]  Jason R. Goertzen,et al.  Detecting a lack of association: an equivalence testing approach. , 2010, The British journal of mathematical and statistical psychology.

[16]  Wolfgang Viechtbauer,et al.  Conducting Meta-Analyses in R with the metafor Package , 2010 .

[17]  S. Wellek Testing Statistical Hypotheses of Equivalence and Noninferiority , 2010 .

[18]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[19]  G. Cumming,et al.  Confidence intervals : better answers to better questions. , 2009 .

[20]  Joseph R. Rausch,et al.  Sample size planning for statistical power and accuracy in parameter estimation. , 2008, Annual review of psychology.

[21]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[22]  G. Ruxton The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test , 2006 .

[23]  Douglas G Altman,et al.  Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement. , 2006, JAMA.

[24]  Donald J. Schuirmann A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability , 1987, Journal of Pharmacokinetics and Biopharmaceutics.

[25]  Walter W. Hauck,et al.  A new statistical procedure for testing equivalence in two-group comparative bioavailability trials , 1984, Journal of Pharmacokinetics and Biopharmaceutics.

[26]  S. Julious Sample sizes for clinical trials with Normal data , 2004, Statistics in medicine.

[27]  Shein-Chung Chow,et al.  A NOTE ON SAMPLE SIZE CALCULATION FOR MEAN COMPARISONS BASED ON NONCENTRAL t-STATISTICS , 2002, Journal of biopharmaceutical statistics.

[28]  G. Cumming,et al.  A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions , 2001 .

[29]  Neil Thomason,et al.  Colloquium on Effect Sizes: the Roles of Editors, Textbook Authors, and the Publication Manual , 2001 .

[30]  Ronald C. Serlin,et al.  Equivalence confidence intervals for two-group comparisons of means , 1998 .

[31]  Gerd Gigerenzer,et al.  Surrogates for Theories , 1998 .

[32]  R. Berger,et al.  Bioequivalence trials, intersection-union tests and equivalence confidence sets , 1996 .

[33]  J. L. Rogers,et al.  Using significance tests to evaluate equivalence between two experimental groups. , 1993, Psychological bulletin.

[34]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[35]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.