The Limited Role of Formal Statistical Inference in Scientific Inference

ABSTRACT Such is the grip of formal methods of statistical inference—that is, frequentist methods for generalizing from sample to population in enumerative studies—in the drawing of scientific inferences that the two are routinely deemed equivalent in the social, management, and biomedical sciences. This, despite the fact that legitimate employment of said methods is difficult to implement on practical grounds alone. But supposing the adoption of these procedures were simple does not get us far; crucially, methods of formal statistical inference are ill-suited to the analysis of much scientific data. Even findings from the claimed gold standard for examination by the latter, randomized controlled trials, can be problematic. Scientific inference is a far broader concept than statistical inference. Its authority derives from the accumulation, over an extensive period of time, of both theoretical and empirical knowledge that has won the (provisional) acceptance of the scholarly community. A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development. The manifold problems surrounding the implementation and usefulness of formal methods of statistical inference in advancing science do not speak well of much teaching in methods/statistics classes. Serious reflection on statistics' role in producing viable knowledge is needed. Commendably, the American Statistical Association is committed to addressing this challenge, as further witnessed in this special online, open access issue of The American Statistician.

[1]  M. Meldrum,et al.  A brief history of the randomized controlled trial. From oranges and lemons to the gold standard. , 2000, Hematology/oncology clinics of North America.

[2]  George M. Zinkhan,et al.  Nonresponse and generalizability in academic research , 2006 .

[3]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[4]  G. Harman The Inference to the Best Explanation , 1965 .

[5]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[6]  D. Freedman From association to causation: some remarks on the history of statistics , 1999 .

[7]  J. Windeler [External validity]. , 2008, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.

[8]  Karolin Baecker,et al.  Inference to the Best Explanation: , 2021, The Material Theory of Induction.

[9]  B. Haig An abductive theory of scientific method. , 2005, Psychological methods.

[10]  M. J. Bayarri,et al.  Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing , 2003 .

[11]  Lisa D. Cota,et al.  Four Bad Habits of Modern Psychologists , 2017, Behavioral sciences.

[12]  Lei Chen,et al.  The mean does not mean as much anymore: finding sub-groups for tailored therapeutics , 2010, Clinical trials.

[13]  Rory A. Fisher,et al.  The Arrangement of Field Experiments , 1992 .

[14]  William Q. Meeker,et al.  Assumptions for statistical inference , 1993 .

[15]  Jim Woodward,et al.  Data and phenomena , 1989, Synthese.

[16]  Nicole A. Lazar,et al.  ASA Statement on Statistical Significance and p-Values , 2020 .

[17]  J. A. Nelder,et al.  Statistics, Science and Technology , 1986 .

[18]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[19]  Raymond Hubbard,et al.  The significant difference paradigm promotes bad science , 2013 .

[20]  Ralph L. Rosnow,et al.  Essentials of Behavioral Research: Methods and Data Analysis , 1984 .

[21]  Gerd Gigerenzer,et al.  Surrogate Science , 2015 .

[22]  Richard A. Berk,et al.  Statistical Assumptions as Empirical Commitments , 2001 .

[23]  Gerd Gigerenzer,et al.  The superego, the ego, and the id in statistical reasoning , 1993 .

[24]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[25]  R. Graves,et al.  Statistical Power and Effect Sizes of Clinical Neuropsychology Research , 2001, Journal of clinical and experimental neuropsychology.

[26]  T. Wigram,et al.  Therapeutic Songwriting in Music Therapy , 2008 .

[27]  William R. Shadish,et al.  Comment—Design rules: More steps toward a complete theory of quasi-experimentation , 1999 .

[28]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[29]  Chris Chatfield,et al.  Confessions of a pragmatic statistician , 2002 .

[30]  Louis Guttman,et al.  The Illogic of Statistical Inference for Cumulative Science , 1984 .

[31]  James P. Shaver,et al.  Populations, Samples, Randomness, and Replication in Two Social Studies Journals , 1980 .

[32]  M. Bottai,et al.  Lessons in biostatistics: inferences and conjectures about average and conditional treatment effects in randomized trials and observational studies , 2014, Journal of internal medicine.

[33]  D. Kent,et al.  When averages hide individual differences in clinical trials , 2007 .

[34]  D. Mook,et al.  In defense of external invalidity. , 1983 .

[35]  Raymond Hubbard,et al.  Corrupt Research: The Case for Reconceptualizing Empirical Management and SocialScience , 2016 .

[36]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[37]  Ben A. Williams,et al.  Perils of Evidence-Based Medicine , 2010, Perspectives in biology and medicine.

[38]  J. I The Design of Experiments , 1936, Nature.

[39]  Nancy Cartwright,et al.  Are RCTs the Gold Standard? , 2007 .

[40]  Raymond Hubbard,et al.  From significant difference to significant sameness: Proposing a paradigm shift in business research , 2013 .

[41]  Eugene Demidenko,et al.  The p-Value You Can’t Buy , 2016, The American statistician.

[42]  G. Gigerenzer Mindless statistics , 2004 .

[43]  S. Goodman,et al.  p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.

[44]  L. Delbeke Quasi-experimentation - design and analysis issues for field settings - cook,td, campbell,dt , 1980 .

[45]  Peter Lipton,et al.  Inference to the best explanation , 1993 .

[46]  James P. Shaver,et al.  Randomness and Replication in Ten Years of the American Educational Research Journal , 1980 .

[47]  R. MacCoun Experimental and Quasi‐Experimental Designs for Generalized Causal Inference, by William R. Shadish, Thomas D. Cook, and Donald T. Campbell. Boston: Houghton Mifflin, 2001, 623 pp., $65.56. , 2003 .

[48]  Paul R. Rosenbaum,et al.  Replicating Effects and Biases , 2001 .

[49]  W. On Probability As a Basis For Action ' " , 2007 .

[50]  Robert E Kass,et al.  Statistical Inference: The Big Picture. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[51]  Roger D. Peng,et al.  The reproducibility crisis in science: A statistical counterattack , 2015 .