Efficient and Distributed Generalized Canonical Correlation Analysis for Big Multiview Data

Generalized canonical correlation analysis (GCCA) integrates information from data samples that are acquired at multiple feature spaces (or ‘views’) to produce low-dimensional representations—which is an extension of classical two-view CCA. Since the 1960s, (G)CCA has attracted much attention in statistics, machine learning, and data mining because of its importance in data analytics. Despite these efforts, the existing GCCA algorithms have serious complexity issues. The memory and computational complexities of the existing algorithms usually grow as a quadratic and cubic function of the problem dimension (the number of samples / features), respectively—e.g., handling views with <inline-formula><tex-math notation="LaTeX">$\approx \!\!1,000$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>≈</mml:mo><mml:mspace width="-0.166667em"/><mml:mspace width="-0.166667em"/><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>000</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="fu-ieq1-2875908.gif"/></alternatives></inline-formula> features using such algorithms already occupies <inline-formula><tex-math notation="LaTeX">$\approx \!\!10^6$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>≈</mml:mo><mml:mspace width="-0.166667em"/><mml:mspace width="-0.166667em"/><mml:msup><mml:mn>10</mml:mn><mml:mn>6</mml:mn></mml:msup></mml:mrow></mml:math><inline-graphic xlink:href="fu-ieq2-2875908.gif"/></alternatives></inline-formula> memory and the per-iteration complexity is <inline-formula><tex-math notation="LaTeX">$\approx\!\!10^9$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>≈</mml:mo><mml:mspace width="-0.166667em"/><mml:mspace width="-0.166667em"/><mml:msup><mml:mn>10</mml:mn><mml:mn>9</mml:mn></mml:msup></mml:mrow></mml:math><inline-graphic xlink:href="fu-ieq3-2875908.gif"/></alternatives></inline-formula> flops—which makes it hard to push these methods much further. To circumvent such difficulties, we first propose a GCCA algorithm whose memory and computational costs scale <italic>linearly</italic> in the problem dimension and the number of nonzero data elements, respectively. Consequently, the proposed algorithm can easily handle very large sparse views whose sample and feature dimensions both exceed <inline-formula><tex-math notation="LaTeX">$\approx\!\! 100,000$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>≈</mml:mo><mml:mspace width="-0.166667em"/><mml:mspace width="-0.166667em"/><mml:mn>100</mml:mn><mml:mo>,</mml:mo><mml:mn>000</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="fu-ieq4-2875908.gif"/></alternatives></inline-formula>. Our second contribution lies in proposing two distributed algorithms for GCCA, which compute the canonical components of different views in parallel and thus can further reduce the runtime significantly if multiple computing agents are available. We provide detailed convergence analyses of the proposed algorithms and show that all the large-scale GCCA algorithms converge to a Karush-Kuhn-Tucker (KKT) point at least sublinearly. Judiciously designed synthetic and real-data experiments are employed to showcase the effectiveness of the proposed algorithms.

[1]  Li-Zhi Liao,et al.  Towards the global solution of the maximal correlation problem , 2011, J. Glob. Optim..

[2]  John Shawe-Taylor,et al.  A Comparison of Relaxations of Multiset Cannonical Correlation Analysis and Applications , 2013, ArXiv.

[3]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[4]  Nikos D. Sidiropoulos,et al.  Fast Unit-Modulus Least Squares With Applications in Beamforming , 2016, IEEE Transactions on Signal Processing.

[5]  Bo Du,et al.  MMFE: Multitask Multiview Feature Embedding , 2015, 2015 IEEE International Conference on Data Mining.

[6]  Moody T. Chu,et al.  On a Multivariate Eigenvalue Problem, Part I: Algebraic Theory and a Power Method , 1993, SIAM J. Sci. Comput..

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  Nathan Srebro,et al.  Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis , 2016, NIPS.

[9]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10]  Niall M. Adams,et al.  Canonical Correlation Analysis for Detecting Changes in Network Structure , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[11]  Benjamin Van Durme,et al.  Multiview LSA: Representation Learning via Generalized CCA , 2015, NAACL.

[12]  Indrayana Rustandi,et al.  Predictive fMRI analysis for multiple subjects and multiple studies , 2010 .

[13]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[14]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[15]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[16]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[17]  Paul Horst,et al.  Relations amongm sets of measures , 1961 .

[18]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[19]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[20]  Dean P. Foster,et al.  Large Scale Canonical Correlation Analysis with Iterative Least Squares , 2014, NIPS.

[21]  Yun Fu,et al.  Low-Rank Common Subspace for Multi-view Learning , 2014, 2014 IEEE International Conference on Data Mining.

[22]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[23]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[24]  Raman Arora,et al.  Multi-view learning with supervision for transformed bottleneck features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[26]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[27]  G. Golub,et al.  The canonical correlations of matrix pairs and their numerical computation , 1992 .

[28]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[29]  Indrayana Rustandi,et al.  Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis , 2009 .

[30]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[31]  Shuzhong Zhang,et al.  Maximum Block Improvement and Polynomial Optimization , 2012, SIAM J. Optim..

[32]  Yuanzhi Li,et al.  Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition , 2016, ICML.

[33]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[34]  Nikos D. Sidiropoulos,et al.  Efficient and Distributed Algorithms for Large-Scale Generalized Canonical Correlations Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[35]  Ignacio Santamaría,et al.  A learning algorithm for adaptive canonical correlation analysis of several data sets , 2007, Neural Networks.

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  V. Frouin,et al.  Variable selection for generalized canonical correlation analysis. , 2014, Biostatistics.

[38]  Dean P. Foster,et al.  Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis , 2015, ICML.