Generalized Optimal Matching Methods for Causal Inference

We develop an encompassing framework for matching, covariate balancing, and doubly-robust methods for causal inference from observational data called generalized optimal matching (GOM). The framework is given by generalizing a new functional-analytical formulation of optimal matching, giving rise to the class of GOM methods, for which we provide a single unified theory to analyze tractability, consistency, and efficiency. Many commonly used existing methods are included in GOM and, using their GOM interpretation, can be extended to optimally and automatically trade off balance for variance and outperform their standard counterparts. As a subclass, GOM gives rise to kernel optimal matching (KOM), which, as supported by new theoretical and empirical results, is notable for combining many of the positive properties of other methods in one. KOM, which is solved as a linearly-constrained convex-quadratic optimization problem, inherits both the interpretability and model-free consistency of matching but can also achieve the $\sqrt{n}$-consistency of well-specified regression and the efficiency and robustness of doubly robust methods. In settings of limited overlap, KOM enables a very transparent method for interval estimation for partial identification and robust coverage. We demonstrate these benefits in examples with both synthetic and real data

[1]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[2]  B. Maurey,et al.  Chapter 30 - Type, Cotype and K-Convexity , 2003 .

[3]  Donald B. Rubin,et al.  MULTIVARIATE MATCHING METHODS THAT ARE EQUAL PERCENT BIAS REDUCING, I: SOME EXAMPLES , 1974 .

[4]  P. Rosenbaum Design of Observational Studies , 2009, Springer Series in Statistics.

[5]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[6]  G. Imbens,et al.  Bias-Corrected Matching Estimators for Average Treatment Effects , 2002 .

[7]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[8]  P. Rosenbaum A Characterization of Optimal Designs for Observational Studies , 1991 .

[9]  Nathan Kallus,et al.  Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching , 2019, 1908.04748.

[10]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[11]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[12]  D. Donoho Statistical Estimation and Optimal Recovery , 1994 .

[13]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[14]  Jasjeet S. Sekhon,et al.  Generalized full matching and extrapolation of the results from a large-scale voter mobilization experiment. , 2017 .

[15]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[16]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[17]  P. Rosenbaum,et al.  Minimum Distance Matched Sampling With Fine Balance in an Observational Study of Treatment for Ovarian Cancer , 2007 .

[18]  Donald B. Rubin,et al.  Affinely Invariant Matching Methods with Ellipsoidal Distributions , 1992 .

[19]  Qingyuan Zhao Covariate balancing propensity score by tailored loss functions , 2016, The Annals of Statistics.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Guido W. Imbens,et al.  Estimation of the Conditional Variance in Paired Experiments , 2008 .

[22]  G. Imbens,et al.  Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing , 2016 .

[23]  Donald B. Rubin,et al.  Multivariate matching methods that are equal percent bias reducing , 1974 .

[24]  Nathan Kallus,et al.  Kernel Optimal Orthogonality Weighting: A Balancing Approach to Estimating Effects of Continuous Treatments , 2019, 1910.11972.

[25]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[26]  B. Greenberg The use of analysis of convariance and balancing in analytical surveys. , 1953, American journal of public health and the nation's health.

[27]  J. Brooks-Gunn,et al.  Effects of Early Intervention on Cognitive Function of Low Birth Weight Preterm Infants, , 1992, The Journal of pediatrics.

[28]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[29]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[30]  Anatole Beck A convexity condition in Banach spaces and the strong law of large numbers , 1962 .

[31]  Nathan Kallus,et al.  More robust estimation of average treatment effects using kernel optimal matching in an observational study of spine surgical interventions , 2018, Statistics in medicine.

[32]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[33]  G. King,et al.  Causal Inference without Balance Checking: Coarsened Exact Matching , 2012, Political Analysis.

[34]  Michael I. Jordan,et al.  Regression with input-dependent noise: A Gaussian process treatment , 1998 .

[35]  D. V. Lindley,et al.  Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[36]  Dylan S Small,et al.  Optimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes , 2012, Biometrics.

[37]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[38]  W. Lin,et al.  Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , 2012, 1208.2301.

[39]  Richard K. Crump,et al.  Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand , 2006, SSRN Electronic Journal.

[40]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[41]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[42]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[43]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[44]  J. Zubizarreta Journal of the American Statistical Association Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery , 2022 .

[45]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[46]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[47]  Gary King,et al.  The Dangers of Extreme Counterfactuals , 2006, Political Analysis.

[48]  Daniel P. Giesy,et al.  On a convexity condition in normed linear spaces , 1966 .

[49]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[50]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[51]  Nathan Kallus,et al.  A Framework for Optimal Matching for Causal Inference , 2016, AISTATS.

[52]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[53]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[54]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[55]  G. King,et al.  Multivariate Matching Methods That Are Monotonic Imbalance Bounding , 2011 .

[56]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[57]  Donald B. Rubin,et al.  Multivariate matching methods that are equal percent bias reducing , 1974 .

[58]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[59]  Nathan Kallus Optimal a priori balance in the design of controlled experiments , 2013, 1312.0531.

[60]  Dimitris Bertsimas,et al.  The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples , 2015, Oper. Res..

[61]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[62]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[63]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[64]  K. C. G. Chan,et al.  Globally efficient non‐parametric inference of average treatment effects by empirical balancing calibration weighting , 2016, Journal of the Royal Statistical Society. Series B, Statistical methodology.