Covariate balance in simple stratified and clustered comparative studies

In randomized experiments, treatment and control groups should be roughly the same—balanced—in their distributions of pre- treatment variables. But how nearly so? Can descriptive comparisons meaningfully be paired with significance tests? If so, should there be several such tests, one for each pretreatment variable, or should there be a single, omnibus test? Could such a test be engineered to give eas- ily computed p-values that are reliable in samples of moderate size, or would simulation be needed for reliable calibration? What new con- cerns are introduced by random assignment of clusters? Which tests of balance would be optimal? To address these questions, Fisher's randomization inference is ap- plied to the question of balance. Its application suggests the reversal of published conclusions about two studies, one clinical and the other a field experiment in political participation.

[1]  B. Highton,et al.  The First Seven Years of the Political Life Cycle , 2001 .

[2]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[3]  Stanley H. Cohen,et al.  Design and Analysis , 2010 .

[4]  Allan Donner,et al.  Design and Analysis of Cluster Randomization Trials in Health Research , 2001 .

[5]  A. Walker,et al.  Improving the quality of reporting in randomised controlled trials. , 2004, Journal of wound care.

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  P. Yudkin,et al.  Putting theory into practice: a cluster randomized trial with a small number of clusters. , 2001, Statistics in medicine.

[8]  D. Cox,et al.  Inference and Asymptotics , 1994 .

[9]  J. Dovidio,et al.  Institution for Social and Policy Studies , 2005 .

[10]  Vance W Berger,et al.  Quantifying the Magnitude of Baseline Covariate Imbalances Resulting from Selection Bias in Randomized Clinical Trials , 2005, Biometrical journal. Biometrische Zeitschrift.

[11]  H. Bergström On the central limit theorem , 1944 .

[12]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[13]  John A. Michon,et al.  Design considerations , 1993, Generic Intelligent Driver Support.

[14]  J M Bland,et al.  Analysis of a trial randomised in clusters , 1998, BMJ.

[15]  K. Hornik,et al.  A Lego System for Conditional Inference , 2006 .

[16]  B. Hansen,et al.  The essential role of balance tests in propensity‐matched observational studies: Comments on ‘A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine , 2008, Statistics in medicine.

[17]  W. G. Cochran,et al.  Controlling Bias in Observational Studies: A Review. , 1974 .

[18]  B. Leupen,et al.  Design and analysis , 1997 .

[19]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[20]  J. Hájek,et al.  Sampling from a finite population , 1982 .

[21]  J. Whitehead Sample size calculations for ordered categorical data. , 1993, Statistics in medicine.

[22]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[23]  Kosuke Imai,et al.  Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments , 2005, American Political Science Review.

[24]  J. Lewsey,et al.  Comparing completely and stratified randomized designs in cluster randomized trials when the stratifying factor is cluster size: a simulation study , 2004, Statistics in medicine.

[25]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[26]  D. Green,et al.  Monte Carlo Simulation of the Biases in Misspecified Randomization Checks , 2005 .

[27]  C B Begg,et al.  Suspended judgment. Significance tests of covariate imbalance in clinical trials. , 1990, Controlled clinical trials.

[28]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[29]  C. Blyth On Simpson's Paradox and the Sure-Thing Principle , 1972 .

[30]  A Donner,et al.  Methods for comparing event rates in intervention studies when the unit of allocation is a cluster. , 1994, American journal of epidemiology.

[31]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[32]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[33]  I. Olkin,et al.  Improving the quality of reporting of randomized controlled trials. The CONSORT statement. , 1996, JAMA.

[34]  Le Cam,et al.  Locally asymptotically normal families of distributions : certain approximations to families of distributions & thier use in the theory of estimation & testing hypotheses , 1960 .

[35]  D. Altman,et al.  CONSORT statement: extension to cluster randomised trials , 2004, BMJ : British Medical Journal.

[36]  P. Sen,et al.  Theory of rank tests , 1969 .

[37]  G. Kalton,et al.  Standardization: A Technique to Control for Extraneous Variables , 1968 .

[38]  D. Altman Comparability of Randomised Groups , 1985 .

[39]  S. Senn Testing for baseline balance in clinical trials. , 1994, Statistics in medicine.

[40]  M. McIver Putting theory into practice. , 1987, The Canadian nurse.

[41]  D. Green,et al.  Correction to Gerber and Green (2000), Replication of Disputed Findings, and Reply to Imai (2005) , 2005, American Political Science Review.

[42]  D. Green,et al.  The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment , 2000, American Political Science Review.

[43]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[44]  Linda M. Frazier,et al.  The unit of analysis error in studies about physicians’ patient care behavior , 1992, Journal of General Internal Medicine.

[45]  R J Carroll,et al.  On design considerations and randomization-based inference for community intervention trials. , 1996, Statistics in medicine.

[46]  G M Raab,et al.  Balance in cluster randomized trials. , 2001, Statistics in medicine.

[47]  V W Berger,et al.  Detecting selection bias in randomized clinical trials. , 1999, Controlled clinical trials.

[48]  Paul Erdős,et al.  ON THE CENTRAL LIMIT THEOREM FOR SAMPLES FROM A FINITE POPULATION , 2004 .

[49]  John P A Ioannidis,et al.  Evaluation of cluster randomized controlled trials in sub-Saharan Africa. , 2003, American journal of epidemiology.

[50]  N. Breslow,et al.  Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. , 1987, IARC scientific publications.

[51]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[52]  A. Agresti,et al.  Comment: Randomized Confidence Intervals and the Mid-P Approach , 2005 .

[53]  David M. Murray,et al.  Design and Analysis of Group- Randomized Trials , 1998 .