Maximum Moment Restriction for Instrumental Variable Regression

We propose a simple framework for nonlinear instrumental variable (IV) regression based on a kernelized conditional moment restriction (CMR) known as a maximum moment restriction (MMR). The MMR is formulated by maximizing the interaction between the residual and functions of IVs that belong to a unit ball of reproducing kernel Hilbert space (RKHS). This allows us to tackle the IV regression as an empirical risk minimization where the risk depends on the reproducing kernel on the instrument and can be estimated by a U-statistic or V-statistic. This simplification not only enables us to derive elegant theoretical analyses in both parametric and non-parametric settings, but also results in easy-to-use algorithms with a justified hyper-parameter selection procedure. We demonstrate the advantages of our framework over existing ones using experiments on both synthetic and real-world data.

[1]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[2]  Dylan S. Small,et al.  A review of instrumental variable estimators for Mendelian randomization , 2015, Statistical methods in medical research.

[3]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  Arthur Gretton,et al.  Kernel Instrumental Variable Regression , 2019, NeurIPS.

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Robert Hable,et al.  Asymptotic normality of support vector machine variants and other regularized kernel methods , 2010, J. Multivar. Anal..

[8]  P. Hall,et al.  Nonparametric methods for inference in the presence of instrumental variables , 2003, math/0603130.

[9]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[10]  Ole Winther,et al.  Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models , 2014, J. Mach. Learn. Res..

[11]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[12]  Peihua Qiu,et al.  Generalized Least Squares , 2005, Technometrics.

[13]  Xiaohong Chen,et al.  Semi‐Nonparametric IV Estimation of Shape‐Invariant Engel Curves , 2003 .

[14]  T. Martinussen,et al.  Instrumental Variable Estimation with the R Package ivtools , 2019, Epidemiologic Methods.

[15]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[16]  Bernhard Schölkopf,et al.  Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions , 2016, J. Mach. Learn. Res..

[17]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[18]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  J. Horowitz Applied Nonparametric Instrumental Variables Estimation , 2011 .

[21]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[22]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[23]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[24]  A. Wald,et al.  On Stochastic Limit and Order Relationships , 1943 .

[25]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[26]  Xiaohong Chen,et al.  Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions , 2003 .

[27]  S. Penckofer,et al.  The Role of Vitamin D in the Aging Adult , 2014, Journal of aging and gerontology.

[28]  Adrian N. Bishop,et al.  Fast Bayesian Intensity Estimation for the Permanental Process , 2017, ICML.

[29]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[30]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[31]  Vitalii P. Tanana,et al.  Theory of Linear Ill-Posed Problems and its Applications , 2002 .

[32]  Xiaohong Chen,et al.  Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals , 2009 .

[33]  Michael G. Akritas,et al.  Empirical processes associated with V-statistics and a class of estimators under random censoring , 1986 .

[34]  Andrew Bennett,et al.  Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.

[35]  James R. Staley,et al.  A robust and efficient method for Mendelian randomization with hundreds of genetic variants , 2020, Nature Communications.

[36]  Stephen G. Donald,et al.  Choosing the Number of Instruments , 2001 .

[37]  William H. Press,et al.  Numerical recipes in C , 2002 .

[38]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[39]  Krikamol Muandet,et al.  Dual Instrumental Variable Regression , 2019, NeurIPS.

[40]  Olaf H. Klungel,et al.  Instrumental Variable Analysis in Epidemiologic Studies: An Overview of the Estimation Methods , 2015 .

[41]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[42]  David Card The Causal Effect of Education on Learning , 1999 .

[43]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[44]  Krikamol Muandet,et al.  Kernel Conditional Moment Test via Maximum Moment Restriction , 2020, UAI.

[45]  J. Florens,et al.  Nonparametric Instrumental Regression , 2010 .

[46]  Nishanth Dikkala,et al.  Minimax Estimation of Conditional Moment Models , 2020, NeurIPS.

[47]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[48]  W. Newey,et al.  Instrumental variable estimation of nonparametric models , 2003 .

[49]  Xiaohong Chen,et al.  Optimal Sup-Norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression , 2015, 1508.03365.

[50]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[51]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[52]  Fernando Pires Hartwig,et al.  Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption , 2017, bioRxiv.

[53]  Vasilis Syrgkanis,et al.  Adversarial Generalized Method of Moments , 2018, ArXiv.

[54]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[55]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[56]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..