Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis

Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient Augmented Approximate Gradient (AppGrad) scheme for finding top k dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width k and small matrix decomposition of dimension k × k. Further, AppGrad achieves optimal storage complexity O(k(p1+p2)), compared with classical algorithms which usually require O(p12+(p22) space to store two dense whitening matrices. The proposed scheme naturally generalizes to stochastic optimization regime, especially efficient for huge datasets where batch algorithms are prohibitive. The online property of stochastic AppGrad is also well suited to the streaming scenario, where data comes sequentially. To the best of our knowledge, it is the first stochastic algorithm for CCA. Experiments on four real data sets are provided to show the effectiveness of the proposed methods.

[1]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[2]  Ioannis Mitliagkas,et al.  Memory Limited, Streaming PCA , 2013, NIPS.

[3]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[4]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[5]  Mark Johnson,et al.  SVD and Clustering for Unsupervised POS Tagging , 2010, ACL.

[6]  Dean P. Foster Multi-View Dimensionality Reduction via Canonical Correlation Multi-View Dimensionality Reduction via Canonical Correlation Analysis Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimen , 2008 .

[7]  Joachim M. Buhmann,et al.  Correlated random features for fast semi-supervised learning , 2013, NIPS.

[8]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[9]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[10]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.

[11]  Daoqiang Zhang,et al.  Multi-view dimensionality reduction via canonical random correlation analysis , 2015, Frontiers of Computer Science.

[12]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[13]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[14]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nathan Srebro,et al.  Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[17]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[18]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[19]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[20]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[21]  Dean P. Foster,et al.  Two Step CCA: A new spectral method for estimating vector models of words , 2012, ICML 2012.

[22]  Xi Chen,et al.  Structured Sparse Canonical Correlation Analysis , 2012, AISTATS.

[23]  Christos Boutsidis,et al.  Efficient Dimensionality Reduction for Canonical Correlation Analysis , 2012, SIAM J. Sci. Comput..

[24]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[25]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[26]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[27]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[28]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[29]  Dean P. Foster,et al.  Large Scale Canonical Correlation Analysis with Iterative Least Squares , 2014, NIPS.

[30]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[31]  G. Golub,et al.  The canonical correlations of matrix pairs and their numerical computation , 1992 .