DeEPCA: Decentralized Exact PCA with Linear Convergence Rate

Due to the rapid growth of smart agents such as weakly connected computational nodes and sensors, developing decentralized algorithms that can perform computations on local agents becomes a major research direction. This paper considers the problem of decentralized principal components analysis (PCA), which is a statistical method widely used for data analysis. We introduce a technique called subspace tracking to reduce the communication cost, and apply it to power iterations. This leads to a decentralized PCA algorithm called DeEPCA, which has a convergence rate similar to that of the centralized PCA, while achieving the best communication complexity among existing decentralized PCA algorithms. DeEPCA is the first decentralized PCA algorithm with the number of communication rounds for each power iteration independent of target precision. Compared to existing algorithms, the proposed method is easier to tune in practice, with an improved overall communication cost. Our experiments validate the advantages of DeEPCA empirically.

[1]  Abdelhak M. Zoubir,et al.  Performance Analysis of the Decentralized Eigendecomposition and ESPRIT Algorithm , 2015, IEEE Transactions on Signal Processing.

[2]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[3]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[4]  Eric Moulines,et al.  Fast and privacy preserving distributed low-rank regression , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[6]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[7]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2004, STOC '04.

[8]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[9]  A. Stephen Morse,et al.  Accelerated linear iterations for distributed averaging , 2011, Annu. Rev. Control..

[10]  G. Stewart Perturbation Bounds for the $QR$ Factorization of a Matrix , 1977 .

[11]  Woojoo Lee,et al.  Super-sparse principal component analyses for high-throughput genomic data , 2010, BMC Bioinformatics.

[12]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[13]  Waheed Uz Zaman Bajwa,et al.  Cloud K-SVD: A Collaborative Dictionary Learning Algorithm for Big, Distributed Data , 2014, IEEE Transactions on Signal Processing.

[14]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[15]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[16]  Anna Scaglione,et al.  A Review of Distributed Algorithms for Principal Component Analysis , 2018, Proceedings of the IEEE.

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[20]  J. Orestes Cerdeira,et al.  Computational aspects of algorithms for variable selection in the context of principal components , 2004, Comput. Stat. Data Anal..

[21]  N. Samatova,et al.  Principal Component Analysis for Dimension Reduction in Massive Distributed Data Sets ∗ , 2002 .

[22]  Haishan Ye,et al.  Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[23]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[24]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[25]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[26]  Marc Moonen,et al.  Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed PCA , 2014, Signal Process..

[27]  H Moon,et al.  Computational and Performance Aspects of PCA-Based Face-Recognition Algorithms , 2001, Perception.

[28]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[29]  Shahin Shahrampour,et al.  Decentralized Riemannian Gradient Descent on the Stiefel Manifold , 2021, ICML.

[30]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[31]  H. Krim,et al.  The decentralized estimation of the sample covariance , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.