Optimal Linear Estimation under Unknown Nonlinear Transform

Linear regression studies the problem of estimating a model parameter β* ∈ℝ p , from n observations [Formula: see text] from linear model yi = 〈xi , β*〉 + ε i . We consider a significant generalization in which the relationship between 〈xi , β*〉 and yi is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. We propose a novel spectral-based estimation procedure and show that we can recover β* in settings (i.e., classes of link function f) where previous algorithms fail. In general, our algorithm requires only very mild restrictions on the (unknown) functional relationship between yi and 〈xi , β*〉. We also consider the high dimensional setting where β* is sparse, and introduce a two-stage nonconvex framework that addresses estimation challenges in high dimensional regimes where p ≫ n. For a broad class of link functions between 〈xi , β*〉 and yi , we establish minimax lower bounds that demonstrate the optimality of our estimators in both the classical and high dimensional regimes.

[1]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[2]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[3]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[4]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[5]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[6]  R. Cook,et al.  Principal Hessian Directions Revisited , 1998 .

[7]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[8]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[9]  Pierre Alquier,et al.  Sparse single-index model , 2011, J. Mach. Learn. Res..

[10]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[11]  Prateek Jain,et al.  One-Bit Compressed Sensing: Provable Support and Vector Recovery , 2013, ICML.

[12]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2013, SIAM J. Imaging Sci..

[13]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[14]  Adam Tauman Kalai,et al.  The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.

[15]  Yonina C. Eldar,et al.  Phase Retrieval: Stability and Recovery Guarantees , 2012, ArXiv.

[16]  Michel Delecroix,et al.  Optimal smoothing in semiparametric index approximation of regression functions , 2000 .

[17]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[18]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[19]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[21]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[22]  Christopher D. Manning,et al.  Robust Logistic Regression using Shift Parameters , 2013, ACL.

[23]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[24]  Thomas M. Stoker Consistent estimation of scaled coefficients , 2011 .

[25]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[26]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[27]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[28]  Constantine Caramanis,et al.  A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[29]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[30]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[31]  W. Härdle,et al.  Optimal Smoothing in Single-index Models , 1993 .

[32]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[33]  Thomas M. Stoker,et al.  Semiparametric Estimation of Index Coefficients , 1989 .

[34]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[35]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[36]  Christopher D. Manning,et al.  Robust Logistic Regression using Shift Parameters (Long Version) , 2013 .

[37]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[38]  Y. Plan,et al.  High-dimensional estimation with geometric constraints , 2014, 1404.3749.

[39]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[40]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[41]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[42]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[43]  M. Hristache,et al.  On Semiparametric estimation in Single-Index Regression , 2006 .