ESCAPE: Efficiently Counting All 5-Vertex Subgraphs

Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes. We introduce an algorithmic framework that can be adopted to count any small pattern in a graph and apply this framework to compute exact counts for all 5-vertex subgraphs. Our framework is built on cutting a pattern into smaller ones, and using counts of smaller patterns to get larger counts. Furthermore, we exploit degree orientations of the graph to reduce runtimes even further. These methods avoid the combinatorial explosion that typical subgraph counting algorithms face. We prove that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns. We perform extensive empirical experiments on a variety of real-world graphs. We are able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine. To the best of our knowledge, this is the first practical algorithm for 5-vertex pattern counting that runs at this scale. A stepping stone to our main algorithm is a fast method for counting all 4-vertex patterns. This algorithm is typically ten times faster than the state of the art 4-vertex counters.

[1]  Alexandros G. Dimakis,et al.  Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs , 2015, KDD.

[2]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[3]  Christian Komusiewicz,et al.  Parameterized Algorithmics for Finding Connected Motifs in Biological Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Tamara G. Kolda,et al.  Degree relations of triangles in real-world networks and graph models , 2012, CIKM.

[5]  Yuval Shavitt,et al.  Approximating the Number of Network Motifs , 2009, Internet Math..

[6]  Tamara G. Kolda,et al.  Fast Triangle Counting through Wedge Sampling , 2012, ArXiv.

[7]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[8]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[9]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[10]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[11]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Yuval Shavitt,et al.  Efficient Counting of Network Motifs , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems Workshops.

[13]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[14]  Mohammad Al Hasan,et al.  Graft: An Efficient Graphlet Counting Method for Large Graph Analysis , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[16]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[17]  Ah Reum Kang,et al.  Analysis of Context Dependence in Social Interaction Networks of a Massively Multiplayer Online Role-Playing Game , 2012, PloS one.

[18]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[19]  Didier Colle,et al.  An Algorithm to Automatically Generate the Combinatorial Orbit Counting Equations , 2016, PloS one.

[20]  Christophe Prieur,et al.  Structure of Neighborhoods in a Large Social Network , 2009, 2009 International Conference on Computational Science and Engineering.

[21]  P. Holland,et al.  A Method for Detecting Structure in Sociometric Data , 1970, American Journal of Sociology.

[22]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[23]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[25]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[26]  A. Portes Social Capital: Its Origins and Applications in Modern Sociology , 1998 .

[27]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[28]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[29]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[30]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[31]  Alexandros G. Dimakis,et al.  Distributed Estimation of Graph 4-Profiles , 2016, WWW.

[32]  References , 1971 .

[33]  R. Burt Structural Holes and Good Ideas1 , 2004, American Journal of Sociology.

[34]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[35]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Noshir S. Contractor,et al.  Is a friend a friend?: investigating the structure of friendship networks in virtual worlds , 2010, CHI Extended Abstracts.

[37]  David Hales,et al.  Motifs in evolving cooperative networks look like protein structure networks , 2008, Networks Heterog. Media.

[38]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[39]  Katherine Faust,et al.  A puzzle concerning triads in social networks: Graph constraints and the triad census , 2010, Soc. Networks.

[40]  Michael Szell,et al.  Measuring social dynamics in a massive multiplayer online game , 2009, Soc. Networks.

[41]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[42]  Chun-Hsi Huang,et al.  Biological network motif detection: principles and practice , 2012, Briefings Bioinform..

[43]  Michael Szell,et al.  Multirelational organization of large-scale social networks in an online world , 2010, Proceedings of the National Academy of Sciences.

[44]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[45]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[46]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[47]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[48]  G. Fagiolo Clustering in complex directed networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[50]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[51]  J. Coleman,et al.  Social Capital in the Creation of Human Capital , 1988, American Journal of Sociology.

[52]  Janez Demsar,et al.  Combinatorial algorithm for counting small induced graphs and orbits , 2017, PloS one.

[53]  Madhav V. Marathe,et al.  SAHAD: Subgraph Analysis in Massive Networks Using Hadoop , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[54]  Etienne Birmele,et al.  Detecting local network motifs , 2010, 1007.1410.

[55]  Ryan A. Rossi,et al.  Fast Parallel Graphlet Counting for Large Networks , 2015, ArXiv.

[56]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[57]  Süleyman Cenk Sahinalp,et al.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution , 2006, Systems Biology and Computational Proteomics.

[58]  Noga Alon,et al.  Color-coding: a new method for finding simple paths, cycles and other small subgraphs within large graphs , 1994, STOC '94.