论文信息 - Learning Deep Features in Instrumental Variable Regression - 字舞流文

Learning Deep Features in Instrumental Variable Regression

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by utilizing an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

Nando de Freitas | A. Doucet | N. D. Freitas | A. Gretton | Yutian Chen | Siddarth Srinivasan | Liyuan Xu

[1] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.

[2] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[3] Emma Brunskill,et al. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.

[4] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[5] Hoang Minh Le,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, NeurIPS Datasets and Benchmarks.

[6] Krikamol Muandet,et al. Dual Instrumental Variable Regression , 2019, NeurIPS.

[7] Krikamol Muandet,et al. Dual IV: A Single Stage Instrumental Variable Regression , 2019, ArXiv.

[8] Stefano V. Albrecht,et al. Stabilizing Generative Adversarial Networks: A Survey , 2019, 1910.00927.

[9] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[10] Jieping Ye,et al. Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation , 2019, KDD.

[11] M. Sahani,et al. Kernel Instrumental Variable Regression , 2019, NeurIPS.

[12] Andrew Bennett,et al. Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.

[13] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.

[14] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[15] Kevin Leyton-Brown,et al. Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[16] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17] Xiaohong Chen,et al. Optimal Sup-Norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression , 2015, 1508.03365.

[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[19] Christian Hansen,et al. Instrumental variables estimation with many weak instruments using regularized JIVE , 2014 .

[20] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[21] Elias Bareinboim,et al. Causal Inference by Surrogate Experiments: z-Identifiability , 2012, UAI.

[22] J. Horowitz,et al. Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation , 2011 .

[23] J. Florens,et al. Nonparametric Instrumental Regression , 2010 .

[24] Xiaohong Chen,et al. Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals , 2009 .

[25] Joshua D. Angrist,et al. Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[26] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[27] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[28] Xiaohong Chen,et al. Semi‐Nonparametric IV Estimation of Shape‐Invariant Engel Curves , 2003 .

[29] W. Newey,et al. Instrumental variable estimation of nonparametric models , 2003 .

[30] J. Stock,et al. Retrospectives Who Invented Instrumental Variable Regression , 2003 .

[31] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[32] Joshua D. Angrist,et al. Split-Sample Instrumental Variables Estimates of the Return to Schooling , 1995 .

[33] J. Angrist,et al. Jackknife Instrumental Variables Estimation , 1995 .

[34] Joshua D. Angrist,et al. Identification of Causal Effects Using Instrumental Variables , 1993 .

[35] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[36] L. Hansen. Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[37] G. Wahba,et al. Generalized Inverses in Reproducing Kernel Spaces: An Approach to Regularization of Linear Operator Equations , 1974 .

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[40] J. Florens,et al. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization , 2003 .

[41] Joshua D. Angrist,et al. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records , 1990 .

[42] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[43] Philip G. Wright,et al. The tariff on animal and vegetable oils , 1928 .