Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules.

In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.

[1]  M. Proschan,et al.  Cluster without fluster: The effect of correlated outcomes on inference in randomized clinical trials , 2008, Statistics in medicine.

[2]  Edgar Brunner,et al.  A studentized permutation test for the non-parametric Behrens-Fisher problem , 2007, Comput. Stat. Data Anal..

[3]  Hongyuan Cao,et al.  Moderate deviations for two sample t-statistics , 2007 .

[4]  D. Mayo,et al.  Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction , 2006, The British Journal for the Philosophy of Science.

[5]  Dennis K. J. Lin,et al.  Statistics for Experimenters: Design, Innovation, and Discovery, Second Edition , 2006 .

[6]  William A. Brenneman Statistics for Research , 2005, Technometrics.

[7]  Deborah G. Mayo,et al.  Methodology in Practice: Statistical Misspecification Testing , 2004, Philosophy of Science.

[8]  Z. Yang,et al.  Probability models for DNA sequence evolution , 2004, Heredity.

[9]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[10]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[11]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[12]  E. Brunner,et al.  The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation , 2000 .

[13]  A. Janssen,et al.  Testing nonparametric statistical functionals with applications to rank tests , 1999 .

[14]  M. Fay,et al.  Comparing several score tests for interval censored data. , 1999, Statistics in medicine.

[15]  E. Lehmann Elements of large-sample theory , 1998 .

[16]  R. Tweney Error and the growth of experimental knowledge , 1998 .

[17]  John Ludbrook,et al.  Why Permutation Tests are Superior to t and F Tests in Biomedical Research , 1998 .

[18]  Jana Jurečková,et al.  Robust Statistical Procedures, Asymptotics and Interrelations. , 1997 .

[19]  A. Janssen,et al.  Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem , 1997 .

[20]  Deborah G. Mayo,et al.  Error and the Growth of Experimental Knowledge , 1996 .

[21]  J Sun,et al.  A non-parametric test for interval-censored failure time data with application to AIDS studies. , 1996, Statistics in medicine.

[22]  Jana Jurečková,et al.  Robust Statistical Procedures: Asymptotics and Interrelations , 1996 .

[23]  G. Churchill,et al.  Properties of statistical tests of neutrality for DNA polymorphism data. , 1995, Genetics.

[24]  R. Berger,et al.  P Values Maximized Over a Confidence Set for the Nuisance Parameter , 1994 .

[25]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[26]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[27]  Robert W. Mee Confidence Intervals for Probabilities and Tolerance Regions Based on a Generalization of the Mann-Whitney Statistic , 1990 .

[28]  D. G. Simpson,et al.  Breakdown robustness of tests , 1990 .

[29]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[30]  B. Moser,et al.  The two-sample t test versus satterthwaite's approximate f test , 1989 .

[31]  Final report on the aspirin component of the ongoing Physicians' Health Study. , 1989, The New England journal of medicine.

[32]  Satya N. Mishra,et al.  Modern Mathematical Statistics , 1990 .

[33]  Preliminary report: Findings from the aspirin component of the ongoing Physicians' Health Study. , 1988, The New England journal of medicine.

[34]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[35]  D. Finkelstein,et al.  A proportional hazards model for interval-censored failure time data. , 1986, Biometrics.

[36]  S. Hora Statistical Inference Based on Ranks , 1986 .

[37]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[38]  C. Hennekens,et al.  A randomized trial of aspirin and β-carotene among U.S. physicians , 1985 .

[39]  A A Tsiatis,et al.  Exact significance testing to establish treatment equivalence with ordered categorical data. , 1984, Biometrics.

[40]  J. J. Higgins,et al.  A Comparison of the Power of Wilcoxon's Rank-Sum Statistic to that of Student'st Statistic Under Various Nonnormal Distributions , 1980 .

[41]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[42]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[43]  T. E. Doerfler,et al.  The behaviour of some significance tests under experimental randomization , 1969 .

[44]  P. Sen,et al.  Theory of rank tests , 1969 .

[45]  P. J. Huber A Robust Version of the Probability Ratio Test , 1965 .

[46]  John W. Pratt,et al.  Obustness of Some Procedures for the Two-Sample Location Problem , 1964 .

[47]  J. L. Hodges,et al.  Estimates of Location Based on Rank Tests , 1963 .

[48]  J. Putter The Treatment of Ties in Some Nonparametric Tests , 1955 .

[49]  E. L. Lehmann,et al.  Consistency and Unbiasedness of Certain Nonparametric Tests , 1951 .

[50]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[51]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .