Non-Parametric Inference Adaptive to Intrinsic Dimension

We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample size $n$, estimation and inference is feasible as long as the distribution of the conditioning variable has small intrinsic dimension $d$, as measured by locally low doubling measures. Our estimation is based on a sub-sampled ensemble of the $k$-nearest neighbors ($k$-NN) $Z$-estimator. We show that if the intrinsic dimension of the covariate distribution is equal to $d$, then the finite sample estimation error of our estimator is of order $n^{-1/(d+2)}$ and our estimate is $n^{1/(d+2)}$-asymptotically normal, irrespective of $D$. The sub-sampling size required for achieving these results depends on the unknown intrinsic dimension $d$. We propose an adaptive data-driven approach for choosing this parameter and prove that it achieves the desired rates. We discuss extensions and applications to heterogeneous treatment effect estimation.

[1]  Zhiwei Steven Wu,et al.  Orthogonal Random Forest for Causal Inference , 2018, ICML.

[2]  F. Wolak,et al.  Structural Econometric Modeling: Rationales and Examples from Industrial Organization , 2004 .

[3]  Sanjoy Dasgupta,et al.  Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.

[4]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[5]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[6]  Zhiwei Steven Wu,et al.  Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation , 2018, ArXiv.

[7]  Alexandr Andoni,et al.  Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  Alessandro Rinaldo,et al.  Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension , 2018, ICML.

[10]  Y. Mack,et al.  Local Properties of k-NN Regression Estimates , 1981 .

[11]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[12]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[13]  Ilias Zadik,et al.  Orthogonal Machine Learning: Power and Limitations , 2017, ICML.

[14]  R. Tibshirani,et al.  Local Likelihood Estimation , 1987 .

[15]  Demian Pouzo,et al.  Efficient Estimation of Semiparametric Conditional Moment Models with Possibly Nonsmooth Residuals , 2008 .

[16]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[17]  Alexandr Andoni,et al.  Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[18]  Jingbo Wang,et al.  DNN: A Two-Scale Distributional Tale of Heterogeneous Treatment Effect Inference , 2018, ArXiv.

[19]  Vasilis Syrgkanis,et al.  Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models , 2018, ArXiv.

[20]  W. Newey,et al.  Kernel Estimation of Partial Means and a General Variance Estimator , 1994, Econometric Theory.

[21]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[22]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Jianqing Fan,et al.  Local maximum likelihood estimation and inference , 1998 .

[25]  Heinrich Jiang Rates of Uniform Consistency for k-NN Regression , 2017, ArXiv.

[26]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[27]  Lirong Xue,et al.  Achieving the time of 1-NN, but the accuracy of k-NN , 2017, AISTATS.

[28]  Vasilis Syrgkanis,et al.  Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models , 2018 .

[29]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[30]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[31]  Susan Athey,et al.  The State of Applied Econometrics - Causality and Policy Evaluation , 2016, 1607.00699.

[32]  Abubakr Gafar Abdalla,et al.  Probability Theory , 2017, Encyclopedia of GIS.

[33]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[34]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[35]  Christian Hansen,et al.  High-Dimensional Methods and Inference on Structural and Treatment Effects , 2013 .

[36]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[37]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[38]  Julie Tibshirani,et al.  Local Linear Forests , 2018, J. Comput. Graph. Stat..

[39]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[40]  W. Newey,et al.  Constrained Conditional Moment Restriction Models , 2015, Econometrica.

[41]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[42]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[43]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[44]  Devavrat Shah,et al.  Explaining the Success of Nearest Neighbor Methods in Prediction , 2018, Found. Trends Mach. Learn..

[45]  Arthur Lewbel,et al.  A local generalized method of moments estimator , 2007 .

[46]  J. Staniswalis The Kernel Estimate of a Regression Function in Likelihood-Based Models , 1989 .

[47]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[48]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[49]  W. Newey,et al.  16 Efficient estimation of models with conditional moment restrictions , 1993 .

[50]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[51]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[52]  J. Norris Appendix: probability and measure , 1997 .

[53]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[54]  Heinrich Jiang,et al.  Non-Asymptotic Uniform Rates of Consistency for k-NN Regression , 2017, AAAI.

[55]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[56]  Sanjoy Dasgupta,et al.  A tree-based regressor that adapts to intrinsic dimension , 2012, J. Comput. Syst. Sci..

[57]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[58]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[59]  Liva Ralaivola,et al.  Empirical Bernstein Inequalities for U-Statistics , 2010, NIPS.

[60]  Xiaohong Chen,et al.  Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions , 2003 .

[61]  P. Assouad Plongements lipschitziens dans Rn , 2003 .

[62]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[63]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[64]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[65]  C. D. Cutler,et al.  A REVIEW OF THE THEORY AND ESTIMATION OF FRACTAL DIMENSION , 1993 .

[66]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.