Efficient coordinate-descent for orthogonal matrices through Givens rotations

Optimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on {\em Givens-rotations}, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decomposition that is used in learning mixture models, and an algorithm for sparse-PCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[3]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[4]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[5]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[6]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[9]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[10]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[11]  Paul Van Dooren,et al.  Jacobi Algorithm for the Best Low Multilinear Rank Approximation of Symmetric Tensors , 2013, SIAM J. Matrix Anal. Appl..

[12]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[13]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[14]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[15]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[16]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[17]  Ion Necoara,et al.  Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[18]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[19]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[20]  Laurent El Ghaoui,et al.  Large-Scale Sparse Principal Component Analysis with Application to Text Data , 2011, NIPS.

[21]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[22]  L. Ghaoui,et al.  Sparse PCA: Convex Relaxations, Algorithms and Applications , 2010, 1011.3781.

[23]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[24]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[25]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[26]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .