Signed Support Recovery for Single Index Models in High-Dimensions

In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbol{\beta},\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbol{\beta}$ is an $s$-sparse unit vector such that $\boldsymbol{\beta}_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by Lin et al. (2015); and (b) a semi-definite programming (SDP) approach inspired by Amini & Wainwright (2008). When $s=O(p^{1-\delta})$ for some $\delta>0$, we demonstrate that both procedures can succeed in recovering the support of $\boldsymbol{\beta}$ as long as the rescaled sample size $\kappa=\frac{n}{s\log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $\kappa$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\boldsymbol{\beta}$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations.

[1]  Jun S. Liu,et al.  On consistency and sparsity for sliced inverse regression in high dimensions , 2015, 1507.03895.

[2]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[3]  Bo Jiang,et al.  Variable selection for general index models via sliced inverse regression , 2013, 1304.4056.

[4]  Zhou Yu,et al.  Dimension reduction and predictor selection in semiparametric models , 2013 .

[5]  B. Nadler,et al.  Do Semidefinite Relaxations Really Solve Sparse PCA , 2013 .

[6]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[7]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[8]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[9]  Wenxuan Zhong,et al.  Correlation pursuit: forward stepwise variable selection for index models , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  Jing Lei,et al.  Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[11]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[12]  Laurent El Ghaoui,et al.  Large-Scale Sparse Principal Component Analysis with Application to Text Data , 2011, NIPS.

[13]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[14]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[15]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[16]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[17]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[18]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[19]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[20]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[21]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[22]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[23]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[25]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[26]  Lexin Li,et al.  Sparse Sliced Inverse Regression , 2006, Technometrics.

[27]  Lixing Zhu,et al.  On Sliced Inverse Regression With High-Dimensional Covariates , 2006 .

[28]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[29]  R. Dennis Cook,et al.  Testing predictor contributions in sufficient dimension reduction , 2004, math/0406520.

[30]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[31]  William Bialek,et al.  Adaptive Rescaling Maximizes Information Transmission , 2000, Neuron.

[32]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[33]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[34]  Jérôme Saracco,et al.  An asymptotic theory for sliced inverse regression , 1997 .

[35]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[36]  Ker-Chau Li,et al.  Slicing Regression: A Link-Free Regression Method , 1991 .

[37]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[40]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[41]  R. Mukerjee New Approaches to Population , 1941 .