Enhancing Community Detection for Big Sensor Data Clustering via Hyperbolic Network Embedding

In this paper we present a novel big data clustering approach for measurements obtained from pervasive sensor networks. To address the potential very large scale of such datasets, we map the problem of data clustering to a community detection one. Datasets are cast in the form of graphs, representing the relations among individual observations and data clustering is mapped to node clustering (community detection) in the data graph. We propose a novel computational approach for enhancing the traditional Girvan-Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, making it possible to compute more efficiently the hyperbolic edge-betweenness centrality (HEBC) needed in the modified GN algorithm. This allows for more efficient clustering of the nodes of the data graph without significantly sacrificing accuracy. We demonstrate the efficacy of our approach with artificial network and data topologies, and real benchmark datasets. The proposed methodology can be used for efficient clustering of datasets obtained from massive pervasive smart city/building sensor networks, such as the FIESTA-IoT platform, and exploited in various applications such as lower-cost sensing.

[1]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[2]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[3]  Miguel Á. Carreira-Perpiñán,et al.  Proximity Graphs for Clustering and Manifold Learning , 2004, NIPS.

[4]  Vasileios Karyotis,et al.  Evolutionary Dynamics of Complex Communications Networks , 2013 .

[5]  Bin Cheng,et al.  Building a Big Data Platform for Smart Cities: Experience and Lessons from Santander , 2015, 2015 IEEE International Congress on Big Data.

[6]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Prosenjit Bose,et al.  PROXIMITY GRAPHS: E, δ, Δ, χ AND ω , 2012, Int. J. Comput. Geom. Appl..

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Ben Y. Zhao,et al.  Efficient shortest paths on massive social graphs , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[11]  Jianzhong Li,et al.  Drawing dominant dataset from big sensory data in wireless sensor networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[14]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[15]  Vasileios Karyotis,et al.  Hyperbolic Embedding for Efficient Computation of Path Centralities and Adaptive Routing in Large-Scale Complex Commodity Networks , 2017, IEEE Transactions on Network Science and Engineering.

[16]  Vasileios Karyotis,et al.  A hyperbolic space analytics framework for big network data and their applications , 2016, IEEE Network.

[17]  Mark Crovella,et al.  Hyperbolic Embedding and Routing for Dynamic Graphs , 2009, IEEE INFOCOM 2009.