Machine learning for the estimation of the propensity score: a simulation study

Despite the extensive literature on propensity score (PS) methods there are still several open questions for their implementation. Based on the results of an extensive simulation exercise, we try to address some of these questions and provide guidelines for applicants. The first question we consider is which method should be preferred to estimate the PS. We compare machine learning techniques (MLT) with standard logit models by analyzing the performance of the different PS estimators in matching (M) and weighting (W) via MonteCarlo simulations. Second, we profit of the simulation framework to assess the efficacy of several measures of covariate balance in predicting the quality of the propensity score weighting and matching estimators in terms of ATT bias reduction. With few exception weighting estimators outperform matching estimators in all simulation scenarios in terms of bias reduction. Conditional on M or W random forests, follow by logit, gave the lower bias while tree methods were competitive only when weighting and neural networks and naive bayes only with large data sets. The balance diagnostics with the highest association with the BIAS was the  asam with the inclusion of interaction terms but the association was not significantly different from that of classic asam. Less commonly used metrics (auc, ecdf, var ratio) resulted only weakly associated to the bias.