On Principal Components Regression, Random Projections, and Column Subsampling

Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of Johnson-Lindenstrauss transforms. We provide numerical results based on synthetic and real data as well as basic theory revealing differences and commonalities in terms of statistical performance.

[1]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[2]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[3]  Shusen Wang,et al.  Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging , 2017, ICML.

[4]  Shiva Prasad Kasiviswanathan,et al.  Compressed Sparse Linear Regression , 2017, ArXiv.

[5]  M. Wegkamp,et al.  Optimal selection of reduced rank estimators of high-dimensional matrices , 2010, 1004.2995.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Martin Slawski Compressed Least Squares Regression revisited , 2017, AISTATS.

[8]  Thomas L. Marzetta,et al.  A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates , 2011, IEEE Transactions on Information Theory.

[9]  Rajen Dinesh Shah,et al.  Min-wise hashing for large-scale regression and classication with sparse data , 2013 .

[10]  M. Kendall A course in multivariate analysis , 1958 .

[11]  S. Ahmed,et al.  Big and Complex Data Analysis , 2017 .

[12]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[13]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  Martin J. Wainwright,et al.  Randomized Sketches of Convex Programs With Sharp Guarantees , 2015, IEEE Trans. Inf. Theory.

[15]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[16]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[17]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[18]  Gesellschaft für Klassifikation. Jahrestagung,et al.  Data Analysis, Machine Learning and Knowledge Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e. V., Hildesheim, Germany, August 2012 , 2014, GfKl.

[19]  J. Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008 .

[20]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[21]  Ata Kabán New Bounds on Compressive Linear Least Squares Regression , 2014, AISTATS.

[22]  P. Richetti,et al.  Isotropic Huygens dipoles and multipoles with colloidal particles , 2017, 1707.08902.

[23]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[24]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[25]  Piotr Indyk,et al.  Nearest-neighbor-preserving embeddings , 2007, TALG.

[26]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[27]  Bing Li,et al.  ON PRINCIPAL COMPONENTS AND REGRESSION: A STATISTICAL EXPLANATION OF A NATURAL PHENOMENON , 2009 .

[28]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[29]  Michael W. Mahoney,et al.  A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares , 2014, J. Mach. Learn. Res..

[30]  Dean P. Foster,et al.  Fast Ridge Regression with Randomized Principal Component Analysis and Gradient Descent , 2014, UAI.

[31]  William J. Astle,et al.  Statistical properties of sketching algorithms , 2017, Biometrika.

[32]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[33]  A. James Normal Multivariate Analysis and the Orthogonal Group , 1954 .

[34]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .