The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data

ABSTRACT Widespread concern over the credibility of published results has led to scrutiny of statistical practices. We address one aspect of this problem that stems from the use of balance tests in conjunction with experimental data. When random assignment is botched, due either to mistakes in implementation or differential attrition, balance tests can be an important tool in determining whether to treat the data as observational versus experimental. Unfortunately, the use of balance tests has become commonplace in analyses of “clean” data, that is, data for which random assignment can be stipulated. Here, we show that balance tests can destroy the basis on which scientific conclusions are formed, and can lead to erroneous and even fraudulent conclusions. We conclude by advocating that scientists and journal editors resist the use of balance tests in all analyses of clean data. Supplementary materials for this article are available online

[1]  A. Buja,et al.  Covariance Adjustments for the Analysis of Randomized Field Experiments , 2013, Evaluation review.

[2]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[3]  Robert R. Snapp,et al.  The Garden of Forking Paths , 1941 .

[4]  D. Hofstadter,et al.  Gödel, Escher, Bach: An Eternal Golden Braid@@@Godel, Escher, Bach: An Eternal Golden Braid , 1980 .

[5]  Philip Pham,et al.  Just How Easy is it to Cheat a Linear Regression ? , 2016 .

[6]  D. Freedman Statistical Models and Causal Inference: On Regression Adjustments in Experiments with Several Treatments , 2008, 0803.3757.

[7]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[8]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[9]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[10]  G. Harrison,et al.  Field experiments , 1924, The Journal of Agricultural Science.

[11]  Jeffrey T Leek,et al.  An estimate of the science-wise false discovery rate and application to the top medical literature. , 2014, Biostatistics.

[12]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[13]  SchererNancy,et al.  Does Descriptive Race Representation Enhance Institutional Legitimacy? The Case of the U.S. Courts , 2015 .

[14]  M. Prior,et al.  Improving Media Effects Research through Better Measurement of News Exposure , 2009, The Journal of Politics.

[15]  Douglas Heaven,et al.  How not to be wrong , 2014 .

[16]  Kari Lock Morgan,et al.  Rerandomization to improve covariate balance in experiments , 2012, 1207.5625.

[17]  Bruce G. Link,et al.  Direct-to-Consumer Racial Admixture Tests and Beliefs About Essential Racial Differences , 2014, Social psychology quarterly.

[18]  David A. Freedman,et al.  On regression adjustments to experimental data , 2008, Adv. Appl. Math..

[19]  M. Khoury,et al.  Most Published Research Findings Are False—But a Little Replication Goes a Long Way , 2007, PLoS medicine.

[20]  Randolph J Nudo,et al.  Reporting Guidelines , 2014, Neurorehabilitation and neural repair.

[21]  Kevin Arceneaux,et al.  Reporting Guidelines for Experimental Research: A Report from the Experimental Research Section Standards Committee , 2014, Journal of Experimental Political Science.

[22]  D. Hofstadter,et al.  Godel, Escher, Bach: An Eternal Golden Braid , 1979 .

[23]  Macartan Humphreys,et al.  Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration , 2012, Political Analysis.

[24]  Nicholas A. Valentino,et al.  The Compassion Strategy Race and the Gender Gap in Campaign 2000 , 2004 .

[25]  Luke W. Miratrix,et al.  Adjusting treatment effect estimates by post‐stratification in randomized experiments , 2013 .

[26]  Douglas R. Hofstadter,et al.  Godel, Escher, Bach: An Eternal Golden Braid , 1981 .

[27]  A. Gelman,et al.  The statistical crisis in science , 2014 .

[28]  김희은,et al.  Consolidated Standards of Reporting Trials의 소개: 무작위 대조군 연구의 보고 지침 , 2014 .

[29]  S. Pocock,et al.  Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practiceand problems , 2002, Statistics in medicine.

[30]  Devra C. Moehler,et al.  Partisan Media and Engagement: A Field Experiment in a Newly Liberalized System , 2016 .

[31]  Adrián Baranchuk,et al.  The garden of forking paths , 2021, False Feedback in Economics.

[32]  T. Permutt Testing for imbalance of covariates in controlled experiments. , 1990, Statistics in medicine.

[33]  Jennifer Jerit,et al.  Partisan Perceptual Bias and the Information Environment , 2012 .

[34]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[35]  PanagopoulosCostas Thank You for Voting: Gratitude Expression and Voter Mobilization , 2011 .

[36]  J. Kagan,et al.  Rational choice in an uncertain world , 1988 .

[37]  Jake Bowers,et al.  Covariate balance in simple stratified and clustered comparative studies , 2008, 0808.3857.

[38]  Bruce G. Link,et al.  The Genomic Revolution and Beliefs about Essential Racial Differences , 2013, American sociological review.

[39]  S. Senn Testing for baseline balance in clinical trials. , 1994, Statistics in medicine.

[40]  D. Green,et al.  The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment , 2000, American Political Science Review.

[41]  Adam J. Berinsky,et al.  The Indirect Effects of Discredited Stereotypes in Judgments of Jewish Leaders , 2005 .

[42]  Gavriel Segre,et al.  The definition of a random sequence of qubits: from Noncommutative Algorithmic Probability Theory to Quantum Algorithmic Information Theory and back , 2000, ArXiv.

[43]  S. Assmann,et al.  Subgroup analysis and other (mis)uses of baseline data in clinical trials , 2000, The Lancet.

[44]  Kosuke Imai,et al.  Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments , 2005, American Political Science Review.

[45]  D. Rubin,et al.  RERANDOMIZATION TO IMPROVE COVARIATE BALANCE IN EXPERIMENTS1 BY KARI , 2012 .

[46]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[47]  Please do not cite without the authors ’ permission , 2006 .

[48]  A. Shiryayev On Tables of Random Numbers , 1993 .