GUISE: Uniform Sampling of Graphlets for Large Graph Analysis

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose GUISE, which uses a Markov Chain Monte Carlo (MCMC) sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that GUISE obtains the GFD within few minutes, whereas the exhaustive counting based approach takes several days.

[1]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[2]  Aleksandar Stevanovic,et al.  GraphCrunch 2: Software tool for network modeling, alignment and clustering , 2011, BMC Bioinformatics.

[3]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[4]  Ellen W. Zegura,et al.  A quantitative comparison of graph-based models for Internet topology , 1997, TNET.

[5]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[6]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[7]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[8]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[9]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[10]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[13]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[14]  Lawrence B. Holder,et al.  Graph-based approaches to insider threat detection , 2009, CSIIRW '09.

[15]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[16]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[17]  Malik Magdon-Ismail,et al.  Discovering Hidden Groups in Communication Networks , 2004, ISI.

[18]  Jake T. Lussier,et al.  Final Report : Local Structure and Evolution for Cascade Prediction , 2011 .

[19]  Daniel J. Brass,et al.  Network Analysis in the Social Sciences , 2009, Science.

[20]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[21]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.