Covariate Selection in Propensity Scores Using Outcome Proxies

This study examined the practical problem of covariate selection in propensity scores (PSs) given a predetermined set of covariates. Because the bias reduction capacity of a confounding covariate is proportional to the concurrent relationships it has with the outcome and treatment, particular focus is set on how we might approximate covariate-outcome relationships while retaining the PS as a design tool (i.e., without using the observed outcomes). To make this approach tractable, I examined the extent to which alternative measures of the outcome might inform covariate-outcome empirical relationships and corresponding covariate selection. Specifically, two such measures were examined: proximal pretreatment measures of the outcome and cross validation. Further, because the implications of covariate choice reach beyond the properties of the treatment effect estimator, I reason that the primary objective of PS covariate selection is to effectively and efficiently reduce bias while forming a scientific basis for inference through, for example, covariate balance. By using outcome proxies or cross validation, substantive knowledge is augmented with empirical evidence of covariates' bias reduction/amplification capacities to better inform covariate selection, improve estimation, and form an evidentiary basis for inference.

[1]  Kristin Denton,et al.  The Kindergarten Year: Findings from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99. NCES 2001-023. , 2000 .

[2]  Amy H. Rathbun,et al.  Kindergarten Teachers' Use of Developmentally Appropriate Practices: Results from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999. , 2000 .

[3]  Barbara Schneider Estimating causal effects : using experimental and observational designs : a think tank white paper , 2007 .

[4]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .

[5]  G. Imbens,et al.  Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization , 2001, Health Services and Outcomes Research Methodology.

[6]  L. Hedges,et al.  Intraclass Correlation Values for Planning Group-Randomized Trials in Education , 2007 .

[7]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[8]  Jake Bowers,et al.  Covariate balance in simple stratified and clustered comparative studies , 2008, 0808.3857.

[9]  Sander Greenland,et al.  Invited commentary: variable selection versus shrinkage in the control of multiple confounders. , 2007, American journal of epidemiology.

[10]  Donald B. Rubin,et al.  BEST PRACTICES IN QUASI- EXPERIMENTAL DESIGNS Matching Methods for Causal Inference , 2007 .

[11]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[12]  Larry V. Hedges,et al.  Intraclass Correlations for Planning Group Randomized Experiments in Rural Education , 2007 .

[13]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[14]  Judea Pearl,et al.  On a Class of Bias-Amplifying Variables that Endanger Effect Estimates , 2010, UAI.

[15]  D. Rubin Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? , 2009 .

[16]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[17]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[18]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[20]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[21]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[22]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[23]  P. Holland Statistics and Causal Inference , 1985 .

[24]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[25]  J. Robins,et al.  Estimating exposure effects by modelling the expectation of exposure conditional on confounders. , 1992, Biometrics.

[26]  S. Greenland Quantifying Biases in Causal Models: Classical Confounding vs Collider-Stratification Bias , 2003, Epidemiology.

[27]  Peter M. Steiner,et al.  The importance of covariate selection in controlling for selection bias in observational studies. , 2010, Psychological methods.

[28]  B. Hansen The prognostic analogue of the propensity score , 2008 .

[29]  P. Rosenbaum Sensitivity analysis for certain permutation inferences in matched observational studies , 1987 .

[30]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[31]  Thomas D. Cook,et al.  How Bias Reduction Is Affected by Covariate Choice, Unreliability, and Mode of Data Analysis: Results From Two Types of Within-Study Comparisons , 2009, Multivariate behavioral research.

[32]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[33]  R. Tsay,et al.  Variable Selection in Linear Regression With Many Predictors , 2009 .

[34]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.