Towards Practical Alternating Least-Squares for CCA

Alternating least-squares (ALS) is a simple yet effective solver for canonical correlation analysis (CCA). In terms of ease of use, ALS is arguably practitioners' first choice. Despite recent provably guaranteed variants, the empirical performance often remains unsatisfactory. To promote the practical use of ALS for CCA, we propose truly alternating least-squares. Instead of approximately solving two independent linear systems, in each iteration, it simply solves two coupled linear systems of half the size. It turns out that this coupling procedure is able to bring significant performance improvements in practice. Inspired by accelerated power method, we further propose faster alternating least-squares, where momentum terms are introduced into the update equations. Both algorithms enjoy linear convergence. To make faster ALS even more practical, we put forward adaptive alternating least-squares to avoid tuning the momentum parameter, which is as easy to use as the plain ALS while retaining advantages of the fast version. Experiments on several datasets empirically demonstrate the superiority of the proposed algorithms to recent variants.

[1]  Michael I. Jordan,et al.  Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation , 2018, NeurIPS.

[2]  Paul Mineiro,et al.  Discriminative Features via Generalized Eigenvectors , 2013, ICML.

[3]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[4]  Yuanzhi Li,et al.  Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition , 2016, ICML.

[5]  Christos Boutsidis,et al.  Efficient Dimensionality Reduction for Canonical Correlation Analysis , 2012, SIAM J. Sci. Comput..

[6]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[9]  Nathan Srebro,et al.  Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis , 2016, NIPS.

[10]  Carlo Tomasi,et al.  Singular Value Decomposition , 2021, Encyclopedia of Social Network Analysis and Mining.

[11]  Ioannis Mitliagkas,et al.  Accelerated Stochastic Power Iteration , 2017, AISTATS.

[12]  Chao Gao,et al.  Stochastic Canonical Correlation Analysis , 2017, J. Mach. Learn. Res..

[13]  Florian Yger,et al.  Adaptive Canonical Correlation Analysis Based On Matrix Manifolds , 2012, ICML.

[14]  Sham M. Kakade,et al.  Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[15]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[16]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[17]  Dean P. Foster,et al.  Large Scale Canonical Correlation Analysis with Iterative Least Squares , 2014, NIPS.

[18]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[19]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[20]  Zhihua Zhang,et al.  The Singular Value Decomposition, Applications and Beyond , 2015, ArXiv.

[21]  Lin F. Yang,et al.  On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition , 2019, AISTATS.

[22]  Nathan Srebro,et al.  Stochastic Approximation for Canonical Correlation Analysis , 2017, NIPS.

[23]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[24]  Dean P. Foster,et al.  Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis , 2015, ICML.