A model averaging approach for estimating propensity scores by optimizing balance

Many approaches, including traditional parametric modeling and machine learning techniques, have been proposed to estimate propensity scores. This paper describes a new model averaging approach to propensity score estimation in which parametric and nonparametric estimates are combined to achieve covariate balance. Simulation studies are conducted across different scenarios varying in the degree of interactions and nonlinearities in the treatment model. The results show that, based on inverse probability weighting, the proposed propensity score estimator produces less bias and smaller standard errors than existing approaches. They also show that a model averaging approach with the objective of minimizing the average Kolmogorov–Smirnov statistic leads to the best performing IPW estimator. The proposed approach is also applied to a real data set in evaluating the causal effect of formula or mixed feeding versus exclusive breastfeeding on a child’s body mass index Z-score at age 4. The data analysis shows that formula or mixed feeding is more likely to lead to obesity at age 4, compared to exclusive breastfeeding.

[1]  Jeffrey B. Birch,et al.  Model robust regression: combining parametric, nonparametric, and semiparametric methods , 2001 .

[2]  D. Ehrenthal,et al.  Differences in the Protective Effect of Exclusive Breastfeeding on Child Overweight and Obesity by Mother’s Race , 2016, Maternal and Child Health Journal.

[3]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  J. N. K. Rao,et al.  Bootstrap procedures for the pseudo empirical likelihood method in sample surveys , 2010 .

[6]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[7]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[8]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[9]  J B Birch,et al.  A semiparametric approach to analysing dose-response data. , 2000, Statistics in medicine.

[10]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[11]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[12]  Yeying Zhu,et al.  Variable selection for propensity score estimation via balancing covariates. , 2015, Epidemiology.

[13]  Mark J. van der Laan,et al.  A semiparametric model selection criterion with applications to the marginal structural model , 2006, Comput. Stat. Data Anal..

[14]  Elizabeth A Stuart,et al.  Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. , 2010, Psychological methods.

[15]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[16]  Ingram Olkin,et al.  A Semiparametric Approach to Density Estimation , 1987 .

[17]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[18]  Brian K. Lee,et al.  Weight Trimming and Propensity Score Weighting , 2011, PloS one.

[19]  S. Carpino,et al.  Group Prenatal Care: A Financial Perspective , 2015, Maternal and Child Health Journal.

[20]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Elizabeth A Stuart,et al.  Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. , 2013, Journal of clinical epidemiology.

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  A. Eliakim,et al.  Childhood obesity. , 2005, The Journal of clinical endocrinology and metabolism.

[25]  P. GaileDaniel,et al.  Estimating the arm-wise false discovery rate in array comparative genomic hybridization experiments. , 2007 .

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[28]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[29]  Changbao Wu,et al.  Algorithms and R Codes for the Pseudo Empirical Likelihood Method in Survey Sampling , 2005 .

[30]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[31]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[32]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[33]  M. J. van der Laan Targeted Estimation of Nuisance Parameters to Obtain Valid Statistical Inference , 2014, The international journal of biostatistics.

[34]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[35]  J. Mark,et al.  Targeted estimation of nuisance parameters to obtain valid statistical inference. , 2014 .

[36]  Debashis Ghosh,et al.  Estimating controlled direct effects of restrictive feeding practices in the ‘Early dieting in girls’ study , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.

[37]  Jagbir Singh,et al.  A Semiparametric Approach to Hazard Estimation with Randomly Censored Observations , 1997 .

[38]  G. Imbens,et al.  Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization , 2001, Health Services and Outcomes Research Methodology.

[39]  L. Grummer-Strawn,et al.  Does breastfeeding protect against pediatric overweight? Analysis of longitudinal data from the Centers for Disease Control and Prevention Pediatric Nutrition Surveillance System. , 2004, Pediatrics.

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[41]  Greg Ridgeway,et al.  Toolkit for Weighting and Analysis of Nonequivalent Groups , 2014 .

[42]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[43]  Henry Anhalt,et al.  Consensus statement : Childhood obesity , 2005 .

[44]  Ben Carterette,et al.  Independent Relation of Maternal Prenatal Factors to Early Childhood Obesity in the Offspring , 2013, Obstetrics and gynecology.