Kernel Instrumental Variable Regression

Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which convergence occurs at the minimax optimal rate for unconfounded, single-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric IV regression.

[1]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[2]  J. Horowitz,et al.  Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation , 2011 .

[3]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[4]  Hairul Azlan Annuar,et al.  Foreign investors' interests and corporate tax avoidance: Evidence from an emerging economy , 2015 .

[5]  G. Wahba,et al.  Convergence rates of approximate least squares solutions of linear integral and operator equations of the first kind , 1974 .

[6]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[10]  Arthur Gretton,et al.  Kernel Conditional Exponential Family , 2017, AISTATS.

[11]  John Shawe-Taylor,et al.  Smooth Operators , 2013, ICML.

[12]  Philip G. Wright,et al.  The tariff on animal and vegetable oils , 1928 .

[13]  Byron Boots,et al.  Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[14]  Jason Weston,et al.  A general regression technique for learning transductions , 2005, ICML '05.

[15]  J. Florens,et al.  Nonparametric Instrumental Regression , 2010 .

[16]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[17]  Dougal J. Sutherland Fixing an error in Caponnetto and de Vito (2007) , 2017, ArXiv.

[18]  Byron Boots,et al.  Predictive State Recurrent Neural Networks , 2017, NIPS.

[19]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[20]  Barnabás Póczos,et al.  Two-stage sampled learning theory on distributions , 2015, AISTATS.

[21]  Michael P. Murray Avoiding Invalid Instruments and Coping with Weak Instruments , 2006 .

[22]  G. Wahba,et al.  Generalized Inverses in Reproducing Kernel Spaces: An Approach to Regularization of Linear Operator Equations , 1974 .

[23]  Daniel J. Hsu,et al.  Tail inequalities for sums of random matrices that depend on the intrinsic dimension , 2012 .

[24]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[25]  Xiaohong Chen,et al.  Optimal Sup-Norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression , 2015, 1508.03365.

[26]  Serge Darolles,et al.  Kernel-based nonlinear canonical analysis and time reversibility , 2004 .

[27]  Jordan Bell Trace class operators and Hilbert-Schmidt operators , 2016 .

[28]  Guy Lever,et al.  Conditional mean embeddings as regressors , 2012, ICML.

[29]  David A. Jaeger,et al.  Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak , 1995 .

[30]  E. D. Vito,et al.  Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels , 2005 .

[31]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[32]  Jonathan H. Wright,et al.  A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments , 2002 .

[33]  Lea Fleischer,et al.  Regularization of Inverse Problems , 1996 .

[34]  Judea Pearl,et al.  Causal Inference , 2010 .

[35]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[36]  G. Wahba Spline models for observational data , 1990 .

[37]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[38]  Le Song,et al.  Nonparametric Tree Graphical Models , 2010, AISTATS.

[39]  Joshua D. Angrist,et al.  Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records , 1990 .

[40]  Xiaohong Chen,et al.  Semi‐Nonparametric IV Estimation of Shape‐Invariant Engel Curves , 2003 .

[41]  J. Pearl,et al.  Causal Inference , 2011, Twenty-one Mental Models That Can Change Policing.

[42]  J. Stock,et al.  Retrospectives Who Invented Instrumental Variable Regression , 2003 .

[43]  Joshua D. Angrist,et al.  Split-Sample Instrumental Variables Estimates of the Return to Schooling , 1995 .

[44]  Arthur Gretton,et al.  Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[45]  J. Florens,et al.  Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization , 2003 .

[46]  Cheng Hsiao,et al.  Estimation of Dynamic Models with Error Components , 1981 .

[47]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[48]  G. Wahba,et al.  Regularization and approximation of linear operator equations in reproducing kernel spaces , 1974 .

[49]  W. Newey,et al.  Instrumental variable estimation of nonparametric models , 2003 .

[50]  Lorenzo Rosasco,et al.  A Consistent Regularization Approach for Structured Prediction , 2016, NIPS.

[51]  G. Imbens,et al.  Identification and Estimation of Triangular Simultaneous Equations Models without Additivity , 2002 .

[52]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[53]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[54]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[55]  R. Kanwal Linear Integral Equations , 1925, Nature.

[56]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[57]  L. Wasserman All of Nonparametric Statistics , 2005 .

[58]  Xiaohong Chen,et al.  Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals , 2009 .

[59]  Geoffrey J. Gordon,et al.  Supervised Learning for Dynamical System Learning , 2015, NIPS.

[60]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[61]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[62]  J. Stock,et al.  Instrumental Variables Regression with Weak Instruments , 1994 .

[63]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[64]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[65]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..