Asynchronous gossip principal components analysis

Abstract This paper deals with Principal Components Analysis (PCA) of data spread over a network where central coordination and synchronous communication between networking nodes are forbidden. We propose an asynchronous and decentralized PCA algorithm dedicated to large scale problems, where “large” simultaneously applies to dimensionality, number of observations and network size. It is based on the integration of a dimension reduction step into a gossip consensus protocol. Unlike other approaches, a straightforward dual formulation makes it suitable when observed dimensions are distributed. We theoretically show its equivalence with a centralized PCA under a low-rank assumption on training data. An experimental analysis reveals that it achieves a good accuracy with a reasonable communication cost even when the low-rank assumption is relaxed.

[1]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[2]  Alfred O. Hero,et al.  Decomposable Principal Component Analysis , 2009, IEEE Transactions on Signal Processing.

[3]  Marc Moonen,et al.  Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed PCA , 2014, Signal Process..

[4]  M. V. Steen,et al.  Newscast Computing , 2003 .

[5]  Marco Baldi,et al.  Performance of Gossip Algorithms in Wireless Sensor Networks , 2011, Solutions on Embedded Systems.

[6]  OrdonezCarlos,et al.  PCA for large data sets with parallel data summarization , 2014 .

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2004, STOC '04.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Alfred O. Hero,et al.  Distributed principal component analysis on networks via directed graphical models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Sergio Valcarcel Macua,et al.  Consensus-based distributed principal component analysis in wireless sensor networks , 2010, 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[12]  Marc Gelgon,et al.  Gossip-Based Computation of a Gaussian Mixture Model for Distributed Multimedia Indexing , 2008, IEEE Transactions on Multimedia.

[13]  Rachid Guerraoui,et al.  On the complexity of asynchronous gossip , 2008, PODC '08.

[14]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[15]  Walid Hachem,et al.  Analysis of Sum-Weight-Like Algorithms for Averaging in Wireless Sensor Networks , 2012, IEEE Transactions on Signal Processing.

[16]  A. Rhodius On the maximum of ergodicity coefficients, the Dobrushin ergodicity coefficient, and products of stochastic matrices , 1997 .

[17]  Devavrat Shah,et al.  Gossip Algorithms , 2009, Found. Trends Netw..

[18]  Kimberly Keeton,et al.  Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems , 2011, SIGMETRICS 2011.

[19]  Andrea Montanari,et al.  Gossip PCA , 2011, PERV.

[20]  Carlos Garcia-Alvarado,et al.  PCA for large data sets with parallel data summarization , 2013, Distributed and Parallel Databases.

[21]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[22]  John N. Tsitsiklis,et al.  Weighted Gossip: Distributed Averaging using non-doubly stochastic matrices , 2010, 2010 IEEE International Symposium on Information Theory.

[23]  Pierrick Bruneau,et al.  Aggregation of Probabilistic PCA Mixtures with a Variational-Bayes Technique Over Parameters , 2010, 2010 20th International Conference on Pattern Recognition.

[24]  Afshin Nikseresht,et al.  Estimation de modèles de mélange probabilistes: une proposition pour un fonctionnement réparti et décentralise. (A proposal for decentralized, distributed estimation of probabilistic mixture models) , 2008 .

[25]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[26]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[27]  David Picard,et al.  Dimensionality reduction in decentralized networks by Gossip aggregation of principal components analyzers , 2014, ESANN.