Propensity score matching with clustered data. An application to the estimation of the impact of caesarean section on the Apgar score

This article focuses on the implementation of propensity score matching for clustered data. Different approaches to reduce bias due to cluster-level confounders are considered and compared using Monte Carlo simulations. We investigated methods that exploit the clustered structure of the data in two ways: in the estimation of the propensity score model (through the inclusion of fixed or random effects) or in the implementation of the matching algorithm. In addition to a pure within-cluster matching, we also assessed the performance of a new approach, 'preferential' within-cluster matching. This approach first searches for control units to be matched to treated units within the same cluster. If matching is not possible within-cluster, then the algorithm searches in other clusters. All considered approaches successfully reduced the bias due to the omission of a cluster-level confounder. The preferential within-cluster matching approach, combining the advantages of within-cluster and between-cluster matching, showed a relatively good performance both in the presence of big and small clusters, and it was often the best method. An important advantage of this approach is that it reduces the number of unmatched units as compared with a pure within-cluster matching. We applied these methods to the estimation of the effect of caesarean section on the Apgar score using birth register data. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  B. Giraudeau,et al.  Propensity score methods for estimating relative risks in cluster randomized trials with low‐incidence binary outcomes and selection bias , 2014, Statistics in medicine.

[2]  C. Wagner,et al.  Comparative Neonatal Morbidity of Abdominal and Vaginal Deliveries After Uncomplicated Pregnancies , 1995 .

[3]  Harvey Goldstein,et al.  Multilevel modelling of medical data , 2002, Statistics in medicine.

[4]  M Soledad Cepeda,et al.  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. , 2003, American journal of epidemiology.

[5]  Thomas A. DiPrete,et al.  Multilevel Models: Methods and Substance , 1994 .

[6]  Margaret Wood,et al.  The Apgar score has survived the test of time. , 2005, Anesthesiology.

[7]  Michael Lechner,et al.  The performance of estimators based on the propensity score , 2013 .

[8]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[9]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[10]  Marco Caliendo,et al.  Some Practical Guidance for the Implementation of Propensity Score Matching , 2005, SSRN Electronic Journal.

[11]  Bruno Arpino,et al.  The specification of the propensity score in multilevel observational studies , 2011, Comput. Stat. Data Anal..

[12]  E. Stuart,et al.  Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies , 2015, Statistics in medicine.

[13]  G. Imbens Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review , 2004 .

[14]  J. Neilson,et al.  Caesarean section for non-medical reasons at term. , 2012, The Cochrane database of systematic reviews.

[15]  G. Imbens,et al.  On the Failure of the Bootstrap for Matching Estimators , 2006 .

[16]  H S Luft,et al.  Effects of Surgeon Volume and Hospital Volume on Quality of Care in Hospitals , 1987, Medical care.

[17]  R. Porcher,et al.  Within-center matching performed better when using propensity score matching to analyze multicenter survival data: empirical and Monte Carlo studies. , 2013, Journal of clinical epidemiology.

[18]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[19]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[20]  J. Zeitlin,et al.  The second European Perinatal Health Report: documenting changes over 6 years in the health of mothers and babies in Europe , 2013, Journal of Epidemiology & Community Health.

[21]  Peter C Austin,et al.  Some Methods of Propensity‐Score Matching had Superior Performance to Others: Results of an Empirical Investigation and Monte Carlo simulations , 2009, Biometrical journal. Biometrische Zeitschrift.

[22]  Mariana C. Arcaya,et al.  Hospital Differences in Cesarean Deliveries in Massachusetts (US) 2004–2006: The Case against Case-Mix Artifact , 2013, PloS one.

[23]  B. Giraudeau,et al.  Propensity scores used for analysis of cluster randomized trials with selection bias: a simulation study , 2013, Statistics in medicine.

[24]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[25]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[26]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[27]  D. Savitz,et al.  Health Outcomes for Vaginal Compared With Cesarean Delivery of Appropriately Grown Preterm Neonates , 2013, Obstetrics and gynecology.

[28]  Felix J Thoemmes,et al.  The Use of Propensity Scores for Nonrandomized Designs With Clustered Data , 2011, Multivariate behavioral research.

[29]  C. Drake Effects of misspecification of the propensity score on estimators of treatment effect , 1993 .

[30]  D. Sheftel,et al.  Cardiopulmonary resuscitation of apparently stillborn infants: survival and long-term outcome. , 1991, The Journal of pediatrics.

[31]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[32]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[33]  Fan Li,et al.  Propensity score weighting with multilevel data , 2013, Statistics in medicine.

[34]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[35]  W. Fetter,et al.  Neonatal respiratory morbidity following elective caesarean section in term infants. A 5-year retrospective study and a review of the literature. , 2001, European journal of obstetrics, gynecology, and reproductive biology.

[36]  G. Vittadini,et al.  Comparing health outcomes among hospitals: the experience of the Lombardy Region , 2013, Health Care Management Science.