Estimating multi-index models with response-conditional least squares

The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming $ \mathbb{E} [ Y | X ] = g(A^\top X) $ for some unknown index space $A$ and link function $g$. In this paper we introduce a method for the estimation of the index space, and study the propagation error of an index space estimate in the regression of the link function. The proposed method approximates the index space by the span of linear regression slope coefficients computed over level sets of the data. Being based on ordinary least squares, our approach is easy to implement and computationally efficient. We prove a tight concentration bound that shows $N^{-1/2}$-convergence, but also faithfully describes the dependence on the chosen partition of level sets, hence giving indications on the hyperparameter tuning. The estimator's competitiveness is confirmed by extensive comparisons with state-of-the-art methods, both on synthetic and real data sets. As a second contribution, we establish minimax optimal generalization bounds for k-nearest neighbors and piecewise polynomial regression when trained on samples projected onto any $N^{-1/2}$-consistent estimate of the index space, thus providing complete and provable estimation of the multi-index model.

[1]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[6]  Ker-Chau Li,et al.  Rejoinder to "Sliced inverse regression for dimension reduction" , 1991 .

[7]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[8]  R. Cook On the Interpretation of Regression Plots , 1994 .

[9]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[10]  R. Cook Save: a method for dimension reduction and graphics in regression , 2000 .

[11]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[12]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[13]  H. Tong,et al.  An adaptive estimation of dimension reduction space , 2002 .

[14]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .

[15]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[16]  Bing Li,et al.  Determining the dimension of iterative Hessian transformation , 2004 .

[17]  H. Zha,et al.  Contour regression: A general approach to dimension reduction , 2005, math/0508277.

[18]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[19]  Adam Krzyżak,et al.  Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data , 2006 .

[20]  Erik G. Larsson,et al.  Linear Regression With a Sparse Parameter Vector , 2007, IEEE Transactions on Signal Processing.

[21]  Yingxing Li,et al.  On hybrid methods of inverse regression-based algorithms , 2007, Comput. Stat. Data Anal..

[22]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[23]  R. DeVore,et al.  Universal Algorithms for Learning Theory. Part II: Piecewise Polynomial Functions , 2007 .

[24]  P. Bickel,et al.  Local polynomial regression on unknown manifolds , 2007, 0708.0983.

[25]  Yingcun Xia,et al.  Sliced Regression for Dimension Reduction , 2008 .

[26]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[27]  Arnak S. Dalalyan,et al.  A New Algorithm for Estimating the Effective Dimension-Reduction Subspace , 2008, J. Mach. Learn. Res..

[28]  Y. Xia,et al.  A Multiple-Index Model and Dimension Reduction , 2008 .

[29]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[30]  Adam Tauman Kalai,et al.  The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.

[31]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[32]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[33]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[34]  P. Hall,et al.  Single and multiple index functional regression models with nonparametric link , 2011, 1211.5018.

[35]  Graciela Boente,et al.  Robust estimates in generalized partially linear single-index models , 2011, TEST.

[36]  Jan Vybíral,et al.  Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[37]  Volkan Cevher,et al.  Active Learning of Multi-Index Function Models , 2012, NIPS.

[38]  Liping Zhu,et al.  A Semiparametric Approach to Dimension Reduction , 2012, Journal of the American Statistical Association.

[39]  Liping Zhu,et al.  A Review on Dimension Reduction , 2013, International statistical review = Revue internationale de statistique.

[40]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[41]  Liping Zhu,et al.  EFFICIENT ESTIMATION IN SUFFICIENT DIMENSION REDUCTION. , 2013, Annals of statistics.

[42]  Riquan Zhang,et al.  A robust and efficient estimation method for single index models , 2013, J. Multivar. Anal..

[43]  Lixing Zhu,et al.  Multi-index regression models with missing covariates at random , 2014, J. Multivar. Anal..

[44]  Peter Radchenko,et al.  High dimensional single index models , 2015, J. Multivar. Anal..

[45]  Thomas M. Stoker,et al.  Investigating Smooth Multiple Regression by the Method of Average Derivatives , 2015 .

[46]  Mauro Maggioni,et al.  Learning adaptive multiscale approximations to data and functions near low-dimensional sets , 2016, 2016 IEEE Information Theory Workshop (ITW).

[47]  Daniel Soudry,et al.  No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.

[48]  M. Reiß,et al.  Nonasymptotic upper bounds for the reconstruction error of PCA , 2016, The Annals of Statistics.

[49]  Robert D. Nowak,et al.  On Learning High Dimensional Structured Single Index Models , 2016, AAAI.

[50]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[51]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[52]  T. Klock,et al.  Estimating covariance and precision matrices along subspaces , 2019, 1909.12218.

[53]  T. Klock,et al.  Nonlinear generalization of the monotone single index model , 2019 .

[54]  Andrea Montanari,et al.  On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.

[55]  Sufficient Dimension Reduction: Methods and Applications With R , 2020 .

[56]  Arun K. Kuchibhotla,et al.  Efficient estimation in single index models through smoothing splines , 2016, Bernoulli.

[57]  Conditional regression for single-index models , 2020, 2002.10008.

[58]  J. Arbel,et al.  On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables , 2019, ESAIM: Probability and Statistics.

[59]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[60]  Jan Vybíral,et al.  Identification of Shallow Neural Networks by Fewest Samples , 2018, Information and Inference: A Journal of the IMA.

[61]  Mauro Maggioni,et al.  Multiscale regression on unknown manifolds , 2021, Mathematics in Engineering.