A fault tolerant peer-to-peer distributed EM algorithm

In this paper, a distributed Expectation Maximization (EM) algorithm is proposed for estimating parameters of a Gaussian mixture model in a peer-to-peer network. This algorithm is used for density estimation and clustering of data distributed over nodes of a network. Scalability and fault tolerance are two important advantages of this method. In the E-step of this algorithm, each node calculates local sufficient statistics using its local observations. A peer-to-peer algorithm is then used to diffuse local sufficient statistics to neighboring nodes and estimate global sufficient statistics in each node. In the M-step, each node updates parameters of the Gaussian mixture model using the estimated global sufficient statistics. The proposed method is then used for environmental monitoring and also distributed target classification. Simulation results approve promising performance of this algorithm.

[1]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  W. P. Zajdel,et al.  Bayesian visual surveillance : from object detection to distributed cameras , 2006 .

[4]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[5]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Vincent Cho,et al.  Distributed Mining of Classification Rules , 2002, Knowledge and Information Systems.

[7]  Carlos Ordonez,et al.  FREM: fast and robust EM clustering for large data sets , 2002, CIKM '02.

[8]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[9]  K. Sivakumar,et al.  Collective mining of Bayesian networks from distributed heterogeneous data , 2003, Knowledge and Information Systems.

[10]  Ran Wolff,et al.  Distributed Data Mining in Peer-to-Peer Networks , 2006, IEEE Internet Computing.

[11]  Carlos Ordonez,et al.  Accelerating EM clustering to find high-quality solutions , 2003, Knowledge and Information Systems.

[12]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[13]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).