A Review of Distributed Algorithms for Principal Component Analysis

Principal component analysis (PCA) is a fundamental primitive of many data analysis, array processing, and machine learning methods. In applications where extremely large arrays of data are involved, particularly in distributed data acquisition systems, distributed PCA algorithms can harness local communications and network connectivity to overcome the need of communicating and accessing the entire array locally. A key feature of distributed PCA algorithm is that they defy the conventional notion that the first step toward computing the principal vectors is to form a sample covariance. This paper is a survey of the methodologies to perform distributed PCA on different data sets, their performance, and of their applications in the context of distributed data acquisition systems.

[1]  N. Samatova,et al.  Principal Component Analysis for Dimension Reduction in Massive Distributed Data Sets ∗ , 2002 .

[2]  David Picard,et al.  Asynchronous gossip principal components analysis , 2015, Neurocomputing.

[3]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[4]  Anna Scaglione,et al.  Decentralized Frank–Wolfe Algorithm for Convex and Nonconvex Problems , 2016, IEEE Transactions on Automatic Control.

[5]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[6]  Patrick Gallinari,et al.  A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm , 2018, Machine Learning.

[7]  Sylvain Raybaud,et al.  Distributed Principal Component Analysis for Wireless Sensor Networks , 2008, Sensors.

[8]  Ιωαννησ Τσιτσικλησ,et al.  PROBLEMS IN DECENTRALIZED DECISION MAKING AND COMPUTATION , 1984 .

[9]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[10]  Mehmet E. Yildiz,et al.  Distributed distance estimation for manifold learning and dimensionality reduction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Eric Moulines,et al.  Fast and privacy preserving distributed low-rank regression , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Andrea Montanari,et al.  Gossip PCA , 2011, PERV.

[13]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Data Fusion and Group Inferences , 2010, IEEE Signal Processing Magazine.

[14]  Ioannis Mitliagkas,et al.  Accelerated Stochastic Power Iteration , 2017, AISTATS.

[15]  Sergios Theodoridis,et al.  Distributed robust subspace tracking , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[16]  Ohad Shamir,et al.  Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis , 2017, ICML.

[17]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[18]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[19]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[20]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[21]  Cédric Richard,et al.  Learning a common dictionary over a sensor network , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[22]  Morteza Mardani,et al.  Decentralized Sparsity-Regularized Rank Minimization: Algorithms and Applications , 2012, IEEE Transactions on Signal Processing.

[23]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[24]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Waheed Uz Zaman Bajwa,et al.  Cloud K-SVD: A Collaborative Dictionary Learning Algorithm for Big, Distributed Data , 2014, IEEE Transactions on Signal Processing.

[26]  André Lima Férrer de Almeida,et al.  Distributed large-scale tensor decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Andrea Montanari,et al.  A statistical model for tensor PCA , 2014, NIPS.

[28]  Nagiza F. Samatova,et al.  Distributed Dimension Reduction Algorithms for Widely Dispersed Data , 2002, IASTED PDCS.

[29]  Sanjoy Dasgupta,et al.  The Fast Convergence of Incremental PCA , 2013, NIPS.

[30]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[31]  Ioannis D. Schizas,et al.  A Distributed Framework for Dimensionality Reduction and Denoising , 2015, IEEE Transactions on Signal Processing.

[32]  Pascal Bianchi,et al.  Asynchronous distributed principal component analysis using stochastic approximation , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[33]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[34]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[35]  Alfred O. Hero,et al.  Decomposable Principal Component Analysis , 2009, IEEE Transactions on Signal Processing.

[36]  David P. Woodruff,et al.  Improved Distributed Principal Component Analysis , 2014, NIPS.

[37]  Anna Scaglione,et al.  Distributed Principal Subspace Estimation in Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[38]  David Picard,et al.  Dimensionality reduction in decentralized networks by Gossip aggregation of principal components analyzers , 2014, ESANN.

[39]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[40]  Santosh S. Vempala,et al.  Principal Component Analysis and Higher Correlations for Distributed Data , 2013, COLT.

[41]  Takeo Kanade,et al.  Optimal approximation of uniformly rotated images: relationship between Karhunen-Loeve expansion and discrete cosine transform , 1998, IEEE Trans. Image Process..

[42]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[43]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[44]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[45]  K. Abed-Meraim,et al.  Fast algorithms for subspace tracking , 2001, IEEE Signal Processing Letters.

[46]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[47]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[48]  Dong Wang,et al.  Distributed estimation of principal eigenspaces. , 2017, Annals of statistics.

[49]  Nikos D. Sidiropoulos,et al.  Scalable and Flexible Multiview MAX-VAR Canonical Correlation Analysis , 2016, IEEE Transactions on Signal Processing.

[50]  George Atia,et al.  A decentralized approach to robust subspace recovery , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[51]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[52]  Abdelhak M. Zoubir,et al.  Performance Analysis of the Decentralized Eigendecomposition and ESPRIT Algorithm , 2015, IEEE Transactions on Signal Processing.

[53]  Marc Moonen,et al.  Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed PCA , 2014, Signal Process..

[54]  Hairong Qi,et al.  Global Principal Component Analysis for Dimensionality Reduction in Distributed Data Mining , 2003 .

[55]  Marc Moonen,et al.  Distributed Canonical Correlation Analysis in Wireless Sensor Networks With Application to Distributed Blind Source Separation , 2015, IEEE Transactions on Signal Processing.

[56]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[57]  Soummya Kar,et al.  Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[58]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[59]  Iven M. Y. Mareels,et al.  An analysis of the fast subspace tracking algorithm NOja , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Anna Scaglione,et al.  A consensus-based decentralized algorithm for non-convex optimization with application to dictionary learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  H. Krim,et al.  The decentralized estimation of the sample covariance , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[62]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[63]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[64]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[65]  Ali H. Sayed,et al.  Dictionary Learning Over Distributed Models , 2014, IEEE Transactions on Signal Processing.

[66]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[67]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[68]  Nikos D. Sidiropoulos,et al.  Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers , 2014, IEEE Transactions on Signal Processing.

[69]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[70]  Christos Boutsidis,et al.  Optimal principal component analysis in distributed and streaming models , 2015, STOC.

[71]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[72]  B. Hofmann-Wellenhof,et al.  Introduction to spectral analysis , 1986 .

[73]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[74]  Franklin T. Luk,et al.  Principal Component Analysis for Distributed Data Sets with Updating , 2005, APPT.

[75]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[76]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[77]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[78]  Anna Scaglione,et al.  The Power-Oja method for decentralized subspace estimation/tracking , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[79]  Christos Boutsidis,et al.  Efficient Dimensionality Reduction for Canonical Correlation Analysis , 2012, SIAM J. Sci. Comput..