Vertex Clustering of Augmented Graph Streams

In this paper we propose a graph stream clustering algorithm with a unified similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.

[1]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[2]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[3]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  WidmerGerhard,et al.  Learning in the presence of concept drift and hidden contexts , 1996 .

[7]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[8]  Philip S. Yu,et al.  On Graph Stream Clustering with Side Information , 2013, SDM.

[9]  Philip S. Yu,et al.  Dynamic Community Detection in Weighted Graph Streams , 2013, SDM.

[10]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.

[11]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[13]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[16]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[18]  Philip S. Yu,et al.  On dense pattern mining in graph streams , 2010, Proc. VLDB Endow..

[19]  Bo Zhao,et al.  Community evolution detection in dynamic heterogeneous information networks , 2010, MLG '10.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[22]  Kun-Lung Wu,et al.  Efficient processing of streaming graphs for evolution-aware clustering , 2013, CIKM.

[23]  Charu C. Aggarwal,et al.  On Classification of Graph Streams , 2011, SDM.

[24]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[25]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.