GUISE: a uniform sampler for constructing frequency histogram of graphlets

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose Guise, which uses a Markov Chain Monte Carlo sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that Guise obtains the GFD with very low rate of error within few minutes, whereas the exhaustive counting-based approach takes several days.

[1]  Jake T. Lussier,et al.  Final Report : Local Structure and Evolution for Cascade Prediction , 2011 .

[2]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[3]  J. Coleman,et al.  Social Capital in the Creation of Human Capital , 1988, American Journal of Sociology.

[4]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[5]  Mong-Li Lee,et al.  NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs , 2006, KDD '06.

[6]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[7]  Malik Magdon-Ismail,et al.  Discovering Hidden Groups in Communication Networks , 2004, ISI.

[8]  Aleksandar Stevanovic,et al.  GraphCrunch 2: Software tool for network modeling, alignment and clustering , 2011, BMC Bioinformatics.

[9]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[10]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[11]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[12]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[13]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[14]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[15]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[16]  John J Tyson,et al.  Functional motifs in biochemical reaction networks. , 2010, Annual review of physical chemistry.

[17]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[18]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[19]  B. Bollobás The evolution of random graphs , 1984 .

[20]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[21]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[22]  Vojtech Rödl,et al.  A Fast Approximation Algorithm for Computing the Frequencies of Subgraphs in a Given Graph , 1995, SIAM J. Comput..

[23]  F. Schreiber,et al.  MODA: an efficient algorithm for network motif discovery in biological networks. , 2009, Genes & genetic systems.

[24]  Igor Jurisica,et al.  Efficient estimation of graphlet frequency distributions in protein-protein interaction networks , 2006, Bioinform..

[25]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[26]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[27]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[28]  Ravi Montenegro,et al.  Mathematical Aspects of Mixing Times in Markov Chains , 2006, Found. Trends Theor. Comput. Sci..

[29]  Ellen W. Zegura,et al.  A quantitative comparison of graph-based models for Internet topology , 1997, TNET.

[30]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[31]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[32]  Lawrence B. Holder,et al.  Graph-based approaches to insider threat detection , 2009, CSIIRW '09.

[33]  Daniel J. Brass,et al.  Network Analysis in the Social Sciences , 2009, Science.

[34]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[35]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[36]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[37]  Falk Schreiber,et al.  Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks , 2005, Trans. Comp. Sys. Biology.

[38]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[39]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.