论文信息 - Efficient coordinate-descent for orthogonal matrices through Givens rotations - 字舞流文

Efficient coordinate-descent for orthogonal matrices through Givens rotations

Optimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on {\em Givens-rotations}, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decomposition that is used in learning mixture models, and an algorithm for sparse-PCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.

Gal Chechik | Uri Shalit | Gal Chechik | Uri Shalit

[1] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[2] Anima Anandkumar,et al. A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[3] Percy Liang,et al. Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[4] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[5] Yurii Nesterov,et al. Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[6] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[7] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[8] Anima Anandkumar,et al. A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[9] J. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[10] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[11] Paul Van Dooren,et al. Jacobi Algorithm for the Best Low Multilinear Rank Approximation of Symmetric Tensors , 2013, SIAM J. Matrix Anal. Appl..

[12] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[13] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[14] Anima Anandkumar,et al. A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[15] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[16] Allan R. Jones,et al. An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[17] Ion Necoara,et al. Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[18] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[19] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[20] Laurent El Ghaoui,et al. Large-Scale Sparse Principal Component Analysis with Application to Text Data , 2011, NIPS.

[21] E. Lander,et al. Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[22] L. Ghaoui,et al. Sparse PCA: Convex Relaxations, Algorithms and Applications , 2010, 1011.3781.

[23] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[24] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[25] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[26] R. Tibshirani,et al. Sparse Principal Component Analysis , 2006 .