Distributed Clustering Algorithm in Sensor Networks via Normalized Information Measures

Distributed data clustering in sensor networks is receiving increasing attention with the development of network technology. A variety of algorithms for distributed data clustering have been proposed recently. However, most of these algorithms have trouble with either non-Gaussian shaped data clustering or model order selection problem. In order to address such two problems simultaneously, we propose a novel discriminative clustering algorithm with rigorous convergence analysis via normalized information measures and then extend it to a distributed one by borrowing consensus algorithms from the multi-agent consensus community. More specifically, we first select the normalized information distance (NID) between cluster data and cluster labels as the objective function, by minimizing which, a Minimum Normalized Information Distance-based (MNID) algorithm with capabilities of non-Gaussian data clustering and model selection is then proposed. Next, to further implement the MNID algorithm in a distributed manner, we employ some finite-time multi-agent consensus algorithms over the sensor networks to calculate the global model parameters, where only local intermediate variables are exchanged between one-hop neighbors. Both the centralized and the distributed MNID algorithms are proved to converge rigorously. Finally, the validity of the proposed algorithms is demonstrated through numerical tests on both synthetic and real data.

[1]  Wei Xing Zheng,et al.  Resilient Consensus of Discrete-Time Complex Cyber-Physical Networks Under Deception Attacks , 2020, IEEE Transactions on Industrial Informatics.

[2]  Mahmoud Naghibzadeh,et al.  Distributed unequal clustering algorithm in large-scale wireless sensor networks using fuzzy logic , 2018, The Journal of Supercomputing.

[3]  Chunguang Li,et al.  Distributed Information Theoretic Clustering , 2014, IEEE Transactions on Signal Processing.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[7]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[8]  Behrooz Safarinejadian,et al.  Distributed variational Bayesian algorithms for Gaussian mixtures in sensor networks , 2010, Signal Process..

[9]  Behrouz Safarinejadian,et al.  A novel distributed variational approximation method for density estimation in sensor networks , 2016 .

[10]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[12]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Qing Ling,et al.  Decentralized learning for wireless communications and networking , 2015, ArXiv.

[14]  Yong Xiang,et al.  Protection of Big Data Privacy , 2016, IEEE Access.

[15]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[16]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[17]  Elena N. Stankova,et al.  A Bayesian Information Criterion for Unsupervised Learning Based on an Objective Prior , 2019, ICCSA.

[18]  Behrouz Safarinejadian,et al.  Mobile-agent-based distributed variational Bayesian algorithm for density estimation in sensor networks , 2017 .

[19]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[20]  C.N. Hadjicostis,et al.  Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies , 2007, 2007 American Control Conference.

[21]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[22]  Gabriele Oliva,et al.  Distributed k-means algorithm , 2013, ArXiv.

[23]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[24]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[25]  Gang Niu,et al.  Information-Maximization Clustering Based on Squared-Loss Mutual Information , 2014, Neural Computation.

[26]  Antonio J. Plaza,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Spectral–Spatial Hyperspectral Image Segmentation Using S , 2022 .

[27]  Ali H. Sayed,et al.  Distributed Clustering and Learning Over Networks , 2014, IEEE Transactions on Signal Processing.

[28]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[29]  Dongbing Gu,et al.  Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks , 2008, IEEE Transactions on Neural Networks.

[30]  Chunguang Li,et al.  Distributed Variational Bayesian Algorithms Over Sensor Networks , 2016, IEEE Transactions on Signal Processing.

[31]  Wei Xing Zheng,et al.  Distributed $k$ -Means Algorithm and Fuzzy $c$ -Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory , 2017, IEEE Transactions on Cybernetics.

[32]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[33]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[35]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[36]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[37]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Y. Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[39]  William Bialek,et al.  Geometric Clustering Using the Information Bottleneck Method , 2003, NIPS.

[40]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[41]  Nikos A. Vlassis,et al.  Newscast EM , 2004, NIPS.

[42]  Roberto López-Valcarce,et al.  A diffusion-based distributed em algorithm for density estimation in wireless sensor networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[44]  Christoforos N. Hadjicostis,et al.  Distributed finite-time calculation of node eccentricities, graph radius and graph diameter , 2016, Syst. Control. Lett..