Decentralized Dictionary Learning Over Time-Varying Digraphs

This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs.

[1]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[2]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[4]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  Cédric Richard,et al.  Learning a common dictionary over a sensor network , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[6]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[7]  Zhi-Quan Luo,et al.  Dictionary learning for sparse representation: Complexity and algorithms , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Daniel Pérez Palomar,et al.  Distributed nonconvex multiagent optimization over time-varying networks , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[9]  Gesualdo Scutari,et al.  Distributed nonconvex constrained optimization over time-varying digraphs , 2018, Mathematical Programming.

[10]  Anna Scaglione,et al.  Decentralized Frank–Wolfe Algorithm for Convex and Nonconvex Problems , 2016, IEEE Transactions on Automatic Control.

[11]  Junli Liang,et al.  Distributed Dictionary Learning for Sparse Representation in Sensor Networks , 2014, IEEE Transactions on Image Processing.

[12]  Mingyi Hong,et al.  Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications , 2016, ArXiv.

[13]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[14]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[15]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[16]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[17]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18]  Francisco Facchinei,et al.  Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems , 2013, IEEE Transactions on Signal Processing.

[19]  Zhi-Quan Luo,et al.  Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization , 2014, NIPS.

[20]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[21]  Stephen P. Boyd,et al.  Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[22]  J. Cortés,et al.  When does a digraph admit a doubly stochastic adjacency matrix? , 2010, Proceedings of the 2010 American Control Conference.

[23]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[24]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[25]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[26]  Waheed Uz Zaman Bajwa,et al.  Cloud K-SVD: Computing data-adaptive representations in the cloud , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Behrouz Touri,et al.  Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.

[28]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[29]  Mingyi Hong,et al.  A distributed algorithm for dictionary learning over networks , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[30]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[31]  Sergios Theodoridis,et al.  An online algorithm for distributed dictionary learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[33]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[34]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[35]  Francisco Facchinei,et al.  D2L: Decentralized dictionary learning over dynamic networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[37]  Francisco Facchinei,et al.  Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[38]  Volkan Cevher,et al.  A variational approach to stable principal component pursuit , 2014, UAI.

[39]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[40]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[41]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[42]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[43]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[44]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[45]  Anna Scaglione,et al.  A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Soon Ki Jung,et al.  Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset , 2015, Comput. Sci. Rev..

[47]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[48]  Gesualdo Scutari,et al.  Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization , 2018, ArXiv.

[49]  Francisco Facchinei,et al.  Distributed dictionary learning , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[50]  Ali H. Sayed,et al.  Dictionary Learning Over Distributed Models , 2014, IEEE Transactions on Signal Processing.

[51]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[52]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[54]  William J. Dally,et al.  High-radix interconnection networks , 2008 .