Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates

Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose to quantify the estimation loss of CCA by the excess prediction loss defined through a prediction-after-dimension-reduction framework. Such framework suggests viewing CCA estimation as estimating the subspaces spanned by the canonical variates. Interestedly, the proposed error metrics derived from the excess prediction loss turn out to be closely related to the principal angles between the subspaces spanned by the population and sample canonical variates respectively. We characterize the non-asymptotic minimax rates under the proposed metrics, especially the dependency of the minimax rates on the key quantities including the dimensions, the condition number of the covariance matrices, the canonical correlations and the eigen-gap, with minimal assumptions on the joint covariance matrix. To the best of our knowledge, this is the first finite sample result that captures the effect of the canonical correlations on the minimax rates.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  P. Hsu,et al.  ON THE LIMITING DISTRIBUTION OF THE CANONICAL CORRELATIONS , 1941 .

[3]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[4]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[5]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[6]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[7]  P. Wedin On angles between subspaces of a finite dimensional inner product space , 1983 .

[8]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[9]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[10]  Roy Mathias,et al.  The Hadamard Operator Norm of a Circulant and Applications , 1997 .

[11]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[12]  T. W. Anderson Asymptotic Theory for Canonical Correlation Analysis , 1999 .

[13]  Hans Knutsson,et al.  Adaptive analysis of fMRI data , 2003, NeuroImage.

[14]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[15]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[17]  Sham M. Kakade,et al.  Multi-View Dimensionality Reduction via Canonical Correlation Analysis , 2008 .

[18]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[19]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[20]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[21]  Shaogang Gong,et al.  Multi-camera activity correlation analysis , 2009, CVPR.

[22]  Miroslav Fiedler,et al.  Notes on Hilbert and Cauchy matrices , 2010 .

[23]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[24]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[25]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[26]  Xi Chen,et al.  Structured Sparse Canonical Correlation Analysis , 2012, AISTATS.

[27]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[28]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[29]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[30]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[31]  Raman Arora,et al.  Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Harrison H. Zhou,et al.  Sparse CCA via Precision Adjusted Iterative Thresholding , 2013, 1311.6186.

[33]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[34]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[35]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[36]  Harrison H. Zhou,et al.  Minimax estimation in sparse canonical correlation analysis , 2014, 1405.1595.

[37]  Daoqiang Zhang,et al.  Multi-view dimensionality reduction via canonical random correlation analysis , 2015, Frontiers of Computer Science.

[38]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[39]  Dean P. Foster,et al.  Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis , 2015, ICML.

[40]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[41]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.