The specification of the propensity score in multilevel observational studies

Propensity Score Matching (PSM) has become a popular approach to estimation of causal effects. It relies on the assumption that selection into a treatment can be explained purely in terms of observable characteristics (the “unconfoundedness assumption”) and on the property that balancing on the propensity score is equivalent to balancing on the observed covariates. Several applications in social sciences are characterized by a hierarchical structure of data: units at the first level (e.g., individuals) clustered into groups (e.g., provinces). In this paper we explore the use of multilevel models for the estimation of the propensity score for such hierarchical data when one or more relevant cluster-level variables is unobserved. We compare this approach with alternative ones, like a single level model with cluster dummies. By using Monte Carlo evidence we show that multilevel specifications usually achieve reasonably good balancing in cluster level unobserved covariates and consequently reduce the omitted variable bias. This is also the case for the dummy model.

[1]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[2]  J. M. Oakes,et al.  The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. , 2004, Social science & medicine.

[3]  Tyler J Vanderweele,et al.  Ignorability and stability assumptions in neighborhood effects research , 2008, Statistics in medicine.

[4]  Donald B Rubin,et al.  On principles for modeling propensity scores in medical research , 2004, Pharmacoepidemiology and drug safety.

[5]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator , 1998 .

[6]  Ozkan Eren Measuring the Union/Non-Union Wage Gap Using Propensity Score Matching , 2007 .

[7]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[8]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[9]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[10]  Zhong Zhao Sensitivity of Propensity Score Methods to the Specifications , 2005, SSRN Electronic Journal.

[11]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[12]  Jay S. Kaufman,et al.  Methods in social epidemiology , 2006 .

[13]  Philip K. Robins,et al.  Evaluating Program Evaluations: New Evidence on Commonly Used Nonexperimental Methods , 1995 .

[14]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[15]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[16]  R. Blundell,et al.  Evaluating the effect of education on earnings: models, methods and results from the National Child Development Survey , 2005 .

[17]  Junyeop Kim,et al.  Causal Inference in Multilevel Settings in Which Selection Processes Vary across Schools. CSE Technical Report 708. , 2007 .

[18]  D. Basu Randomization Analysis of Experimental Data: The Fisher Randomization Test , 1980 .

[19]  J. Brand,et al.  Regression and matching estimates of the effects of elite college attendance on educational and career achievement , 2006 .

[20]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[21]  D. Basu,et al.  Randomization Analysis of Experimental Data: The Fisher Randomization Test Rejoinder , 1980 .

[22]  Sophia Rabe-Hesketh,et al.  Multilevel and Longitudinal Modeling Using Stata , 2005 .

[23]  S. Mazzuco,et al.  Marital disruption and economic well‐being: a comparative analysis , 2007 .

[24]  David R. Cox Planning of Experiments , 1958 .

[25]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[26]  S. Purdon,et al.  The use of propensity score matching in the evaluation of active labour market policies , 2002 .

[27]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[28]  Marco Caliendo,et al.  Some Practical Guidance for the Implementation of Propensity Score Matching , 2005, SSRN Electronic Journal.

[29]  C. Manski Nonparametric Bounds on Treatment Effects , 1989 .

[30]  Marjorie P. Penfield,et al.  PLANNING THE EXPERIMENT , 1990 .

[31]  Qihua Wang,et al.  Empirical likelihood calibration estimation for the median treatment difference in observational studies , 2011, Comput. Stat. Data Anal..

[32]  F. Yates,et al.  Statistical methods for research workers. 5th edition , 1935 .

[33]  Richard Blundell,et al.  Evaluating the impact of education on earnings in the UK: models, methods and results from the NCDS , 2003 .

[34]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[35]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[36]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[37]  A. Ichino,et al.  From Temporary Help Jobs to Permanent Employment: What Can We Learn from Matching Estimators and Their Sensitivity? , 2006, SSRN Electronic Journal.

[38]  Joop J. Hox,et al.  Applied Multilevel Analysis. , 1995 .

[39]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[40]  J. M. Oakes,et al.  The effect of racial residential segregation on black infant mortality. , 2008, American journal of epidemiology.

[41]  Howard S. Bloom,et al.  Can Nonexperimental Comparison Group Methods Match the Findings from a Random Assignment Evaluation of Mandatory Welfare-to-Work Programs? , 2002 .

[42]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[43]  P. Rosenbaum,et al.  Minimum Distance Matched Sampling With Fine Balance in an Observational Study of Treatment for Ovarian Cancer , 2007 .

[44]  A. Aassve,et al.  Estimation of causal effects of fertility on economic wellbeing: evidence from rural Vietnam , 2008 .

[45]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[46]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[47]  Sophia Rabe-Hesketh,et al.  Generalized latent variable models: multilevel, longitudinal, and structural equation models , 2004 .

[48]  Rajeev Dehejia,et al.  Was There a Riverside Miracle? A Hierarchical Framework for Evaluating Programs With Grouped Data , 2003 .

[49]  Jeffrey A. Smith A Critical Survey of Empirical Methods for Evaluating Active Labor Market Policies , 2000 .

[50]  B. Sianesi An Evaluation of the Swedish System of Active Labor Market Programs in the 1990s , 2004, Review of Economics and Statistics.

[51]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[52]  S. Raudenbush,et al.  Evaluating Kindergarten Retention Policy , 2006 .

[53]  A. Bryson The union membership wage premium: an analysis using propensity score matching , 2002 .

[54]  Donald Hedeker,et al.  Quintile stratification based on a misspecified propensity score in longitudinal treatment effectiveness analyses of ordinal doses , 2007, Comput. Stat. Data Anal..

[55]  Fan Li,et al.  Propensity score weighting with multilevel data , 2013, Statistics in medicine.

[56]  J Michael Oakes,et al.  Commentary: advancing neighbourhood-effects research--selection, inferential support, and structural confounding. , 2006, International journal of epidemiology.

[57]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[58]  Charles F. Manski,et al.  Evaluating Welfare and Training Programs. , 1994 .