FURS: Fast and Unique Representative Subset selection retaining large-scale community structure

We propose a novel algorithm, FURS (Fast and Unique Representative Subset selection) to deterministically select a set of nodes from a given graph which retains the underlying community structure. FURS greedily selects nodes with high-degree centrality from most or all the communities in the network. The nodes with high-degree centrality for each community are usually located at the center rather than the periphery and can better capture the community structure. The nodes are selected such that they are not isolated but can form disconnected components. The FURS is evaluated by quality measures, such as coverage, clustering coefficients, degree distributions and variation of information. Empirically, we observe that the nodes are selected such that most or all of the communities in the original network are retained. We compare our proposed technique with state-of-the-art methods like SlashBurn, Forest-Fire, Metropolis and Snowball Expansion sampling techniques. We evaluate FURS on several synthetic and real-world networks of varying size to demonstrate the high quality of our subset while preserving the community structure. The subset generated by the FURS method can be effectively utilized by model-based approaches with out-of-sample extension properties for inferring community affiliation of the large-scale networks. A consequence of FURS is that the selected subset is also a good candidate set for simple diffusion model. We compare the spread of information over time using FURS for several real-world networks with random node selection, hubs selection, spokes selection, high eigenvector centrality, high Pagerank, high betweenness centrality and low betweenness centrality-based representative subset selection.

[1]  B. Ryan The diffusion of hybrid seed corn in two Iowa communities , 1943 .

[2]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[3]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  J. Coleman,et al.  Medical Innovation: A Diffusion Study. , 1967 .

[6]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[7]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[8]  T. Schelling Micromotives and Macrobehavior , 1978 .

[9]  L. Hubert,et al.  Comparing partitions , 1985 .

[10]  T. Liggett Interacting Particle Systems , 1985 .

[11]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[12]  Rajeev Motwani,et al.  Clique partitions, graph compression and speeding-up algorithms , 1991, STOC '91.

[13]  Glenn Ellison Learning, Local Interaction, and Coordination , 1993 .

[14]  L. Blume The Statistical Mechanics of Strategic Interaction , 1993 .

[15]  U. Feige A threshold of ln n for approximating set cover , 1998, JACM.

[16]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[17]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[18]  Jacob Goldenberg,et al.  Using Complex Systems Analysis to Advance Marketing Theory Development , 2001 .

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  A. Arenas,et al.  Models of social networks based on social distance attachment. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Anna C. Gilbert,et al.  Compressing Network Graphs , 2004 .

[23]  Ove Frank,et al.  Models and Methods in Social Network Analysis: Network Sampling and Model Fitting , 2005 .

[24]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[25]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[26]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[27]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[29]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[30]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[31]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[32]  Jon M. Kleinberg,et al.  Feedback effects between similarity and social influence in online communities , 2008, KDD.

[33]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[34]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[35]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[36]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[37]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[38]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[39]  Steven Skiena,et al.  Expanding network communities from representative examples , 2009, TKDD.

[40]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[41]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  O. Sporns,et al.  Complex brain networks: graph theoretical analysis of structural and functional systems , 2009, Nature Reviews Neuroscience.

[43]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[44]  Johan A. K. Suykens,et al.  Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Faraz Zaidi,et al.  Communities and hierarchical structures in dynamic social networks: analysis and visualization , 2011, Social Network Analysis and Mining.

[46]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[47]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[48]  Salvatore Catanese,et al.  Crawling Facebook for social network analysis purposes , 2011, WIMS '11.

[49]  Matthias Jarke,et al.  Development of computer science disciplines: a social network analysis approach , 2011, Social Network Analysis and Mining.

[50]  M. Saravanan,et al.  Analyzing and labeling telecom communities using structural properties , 2011, Social Network Analysis and Mining.

[51]  Johan A. K. Suykens,et al.  Kernel spectral clustering for community detection in complex networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[52]  Ricardo J. G. B. Campello,et al.  Relative Validity Criteria for Community Mining Algorithms , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[53]  Emilio Ferrara,et al.  A large-scale community structure analysis in Facebook , 2011, EPJ Data Science.

[54]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[55]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[56]  J S Archana,et al.  Community Detection in Complex Networks , 2014 .