Clustering Affiliation Inference from Graph Samples

Graph sampling is a widely-used approach to address the scalability issue when analyzing large-scale graphs. Several promising cluster-preserving sampling algorithms have been proposed. However, once the clustering structure on a sampled graph is obtained, we may still need a method to infer the clustering affiliations of all other nodes in the original graph from the clustered nodes in the sampled subgraph. In this paper, we present a new two-stage clustering inference (TCI ) method to infer clustering affiliations of all nodes in the original graph. TCI is composed of two stages: 1) initialization of clustering affiliations for unsampled nodes based on computed neighborhood affiliation information; 2) label propagation for the whole graph. Our experimental results demonstrate that the proposed TCI method in conjunction with any considered cluster-preserving sampling strategy is capable of inferring the clustering affiliation of the population commendably, and it performs better than the competing methods.

[1]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Sampling from complex networks with high community structures. , 2012, Chaos.

[3]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[4]  Johan A. K. Suykens,et al.  FURS: Fast and Unique Representative Subset selection retaining large-scale community structure , 2013, Social Network Analysis and Mining.

[5]  Mykola Pechenizkiy,et al.  Clustering-Structure Representative Sampling from Graph Streams , 2017, COMPLEX NETWORKS.

[6]  Tanya Y. Berger-Wolf,et al.  Sampling and inference in complex networks , 2011 .

[7]  Johan A. K. Suykens,et al.  Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[10]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[13]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[14]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[16]  Tanya Y. Berger-Wolf,et al.  Benefits of bias: towards better characterization of network sampling , 2011, KDD.

[17]  Nan Cao,et al.  Evaluation of Graph Sampling: A Visualization Perspective , 2017, IEEE Transactions on Visualization and Computer Graphics.

[18]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[19]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[21]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[22]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[23]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[24]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  J. Delvenne,et al.  Random walks on graphs , 2004 .

[26]  Mykola Pechenizkiy,et al.  Structural measures of clustering quality on graph samples , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[27]  Jianpeng Zhang,et al.  On graph sample clustering , 2018 .

[28]  Steven Skiena,et al.  Expanding network communities from representative examples , 2009, TKDD.

[29]  Johan A. K. Suykens,et al.  Kernel spectral clustering for community detection in complex networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[30]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.