A Riemannian gossip approach to subspace learning on Grassmann manifold

In this paper, we focus on subspace learning problems on the Grassmann manifold. Interesting applications in this setting include low-rank matrix completion and low-dimensional multivariate regression, among others. Motivated by privacy concerns, we aim to solve such problems in a decentralized setting where multiple agents have access to (and solve) only a part of the whole optimization problem. The agents communicate with each other to arrive at a consensus, i.e., agree on a common quantity, via the gossip protocol. We propose a novel cost function for subspace learning on the Grassmann manifold, which is a weighted sum of several sub-problems (each solved by an agent) and the communication cost among the agents. The cost function has a finite-sum structure. In the proposed modeling approach, different agents learn individual local subspaces but they achieve asymptotic consensus on the global learned subspace. The approach is scalable and parallelizable. Numerical experiments show the efficacy of the proposed decentralized algorithms on various matrix completion and multivariate regression benchmarks.

[1]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[2]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  René Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, CVPR.

[4]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[5]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[6]  Mehrtash Tafazzoli Harandi,et al.  Joint Dimensionality Reduction and Metric Learning: A Geometric Take , 2017, ICML.

[7]  Andrea Montanari,et al.  Low-rank matrix completion with noisy observations: A quantitative comparison , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[9]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[10]  Yu Zhang Parallel Multi-task Learning , 2015, 2015 IEEE International Conference on Data Mining.

[11]  Bernt Schiele,et al.  Scalable Multitask Representation Learning for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13]  Stéphan Clémençon,et al.  Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions , 2016, ICML.

[14]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  R. Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Gil Alterovitz,et al.  Gene expression prediction using low-rank matrix completion , 2016, BMC Bioinformatics.

[17]  Bamdev Mishra,et al.  Fixed-rank matrix factorizations and Riemannian low-rank optimization , 2012, Comput. Stat..

[18]  Olgica Milenkovic,et al.  A Geometric Approach to Low-Rank Matrix Completion , 2010, IEEE Transactions on Information Theory.

[19]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[20]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[23]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[24]  H. Goldstein Multilevel Modelling of Survey Data , 1991 .

[25]  René Vidal,et al.  Riemannian Consensus for Manifolds With Bounded Curvature , 2012, IEEE Transactions on Automatic Control.

[26]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[27]  Yiming Yang,et al.  Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[28]  István Hegedüs,et al.  Gossip learning with linear models on fully distributed data , 2011, Concurr. Comput. Pract. Exp..

[29]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[30]  Hiroyuki Kasai,et al.  A Riemannian gossip approach to decentralized matrix completion , 2016, ArXiv.

[31]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[33]  Qing Ling,et al.  Decentralized low-rank matrix completion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Qing Ling,et al.  Decentralized and Privacy-Preserving Low-Rank Matrix Completion , 2015, Journal of the Operations Research Society of China.

[35]  Yousef Saad,et al.  Scaled Gradients on Grassmann Manifolds for Matrix Completion , 2012, NIPS.

[36]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[37]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[38]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..

[39]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[40]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[41]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[42]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[43]  Olgica Milenkovic,et al.  Subspace Evolution and Transfer (SET) for Low-Rank Matrix Completion , 2010, IEEE Transactions on Signal Processing.

[44]  Leon Wenliang Zhong,et al.  Convex Multitask Learning with Flexible Task Clusters , 2012, ICML.

[45]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[46]  Mehrtash Harandi,et al.  Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[48]  Devavrat Shah,et al.  Gossip Algorithms , 2009, Found. Trends Netw..

[49]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[50]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[51]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[52]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[53]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[54]  René Vidal,et al.  Average consensus on Riemannian manifolds with bounded curvature , 2011, IEEE Conference on Decision and Control and European Control Conference.

[55]  Forrest N. Iandola,et al.  How to scale distributed deep learning? , 2016, ArXiv.

[56]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[57]  Pratik Jawanpuria,et al.  A Convex Feature Learning Formulation for Latent Task Structure Discovery , 2012, ICML.

[58]  Gunnar Rätsch,et al.  Multi-task Multiple Kernel Learning , 2013 .

[59]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[60]  Silvere Bonnabel,et al.  From subspace learning to distance learning: A geometrical optimization approach , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[61]  Mehrtash Harandi,et al.  Dictionary Learning on Grassmann Manifolds , 2016 .

[62]  Ivan Markovsky,et al.  Structured Low-Rank Approximation with Missing Data , 2013, SIAM J. Matrix Anal. Appl..

[63]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[64]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[65]  Bamdev Mishra,et al.  R3MC: A Riemannian three-factor algorithm for low-rank matrix completion , 2013, 53rd IEEE Conference on Decision and Control.

[66]  Alain Sarlette,et al.  Consensus Optimization on Manifolds , 2008, SIAM J. Control. Optim..

[67]  Pierre-Antoine Absil,et al.  RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[68]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  P. Absil,et al.  Low-rank matrix completion via preconditioned optimization on the Grassmann manifold , 2015, Linear Algebra and its Applications.

[70]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[71]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[72]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..