Causal Inference by Minimizing the Dual Norm of Bias: Kernel Matching & Weighting Estimators for Causal Effects

We consider the problem of estimating causal effects from observational data and propose a novel framework for matching- and weighting-based causal estimators. The framework is based on expressing the bias of a causal estimator as an operator on the unknown conditional expectation function of outcomes and formulating the dual norm of the bias as the norm of this operator with respect to a function space that represents the potential structure for outcomes. We give the term worst-case bias minimizing (WCBM) to estimators that minimize this quantity for some function space and show that a great variety of existing causal estimators belong to this family, including one-to-one matching (with or without replacement), coarsened exact matching, and mean-matched sampling. We propose a range of new, kernel-based matching and weighting estimators that arise when one minimizes the dual norm of the bias with respect to a reproducing kernel Hilbert space. Depending on the case, these estimators can be solved either in closed form, using quadratic optimization, or using integer optimization. We show that estimators based on universal kernels are consistent for the causal effect. In numerical experiments, the new, kernel-based estimators outperform all standard causal estimators in estimation error, providing a successful balance between generality and efficiency.

[1]  T. Shakespeare,et al.  Observational Studies , 2003 .

[2]  A. Krall Applied Analysis , 1986 .

[3]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[4]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[5]  G. King,et al.  A Theory of Statistical Inference for Matching Methods in Applied Causal Research ∗ , 2014 .

[6]  J. Sekhon The Neyman— Rubin Model of Causal Inference and Estimation Via Matching Methods , 2008 .

[7]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[8]  Kellen Petersen August Real Analysis , 2009 .

[9]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[10]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[11]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[12]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[13]  Frederick Mosteller,et al.  Planning and Analysis of Observational Studies. , 1983 .

[14]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[15]  Ian Shrier Propensity scores. , 2009, Statistics in medicine.

[16]  M. Elliott Model Averaging Methods for Weight Trimming. , 2008, Journal of official statistics.

[17]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[18]  J. Pearl The Causal Foundations of Structural Equation Modeling , 2012 .

[19]  G. Imbens,et al.  Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization , 2001, Health Services and Outcomes Research Methodology.

[20]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[21]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[22]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[23]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[24]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[25]  D. Rubin Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? , 2009 .

[26]  Bernhard Schölkopf,et al.  Probabilistic latent variable models for distinguishing between cause and effect , 2010, NIPS.

[27]  Nathan Kallus Optimal a priori balance in the design of controlled experiments , 2013, 1312.0531.

[28]  J. Pearl Remarks on the method of propensity score , 2009, Statistics in medicine.

[29]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  J. Pearl Causal inference in statistics: An overview , 2009 .

[32]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[33]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[34]  A. Goldberger STRUCTURAL EQUATION METHODS IN THE SOCIAL SCIENCES , 1972 .

[35]  David A. Freedman,et al.  On regression adjustments to experimental data , 2008, Adv. Appl. Math..

[36]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[37]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[38]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[39]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[40]  G. King,et al.  Causal Inference without Balance Checking: Coarsened Exact Matching , 2012, Political Analysis.