Counting stars and other small subgraphs in sublinear time

Detecting and counting the number of copies of certain subgraphs (also known as <i>network motifs</i> or <i>graphlets</i>), is motivated by applications in a variety of areas ranging from Biology to the study of the World-Wide-Web. Several polynomial-time algorithms have been suggested for counting or detecting the number of occurrences of certain network motifs. However, a need for more efficient algorithms arises when the input graph is very large, as is indeed the case in many applications of motif counting. In this paper we design <i>sublinear-time</i> algorithms for approximating the number of copies of certain constant-size subgraphs in a graph <i>G</i>. That is, our algorithms do not read the whole graph, but rather query parts of the graph. Specifically, we consider algorithms that may query the degree of any vertex of their choice and may ask for any neighbor of any vertex of their choice. The main focus of this work is on the basic problem of counting the number of length-2 paths and more generally on counting the number of stars of a certain size. Specifically, we design an algorithm that, given an approximation parameter 0 < ε < 1 and query access to a graph <i>G</i>, outputs an estimate <i>vC</i><sub><i>s</i></sub> such that with high constant probability, (1-ε)<i>v</i><sub><i>s</i></sub>(<i>G</i>) ≤ v<sub><i>s</i></sub> ≤ (1 + ε)<i>v</i><sub><i>s</i></sub>(<i>G</i>), where <i>v</i><sub><i>s</i></sub>(<i>G</i>) denotes the number of stars of size <i>s</i> + 1 in the graph. The expected query complexity and running time of the algorithm are [EQUATION] poly (log <i>n</i>, 1/ε). We also prove lower bounds showing that this algorithm is tight up to polylogarithmic factors in <i>n</i> and the dependence on ε. Our work extends the work of Feige (<i>SIAM Journal on Computing, 2006</i>) and Goldreich and Ron (<i>Random Structures and Algorithms, 2008</i>) on approximating the number of edges (or average degree) in a graph. Combined with these results, our result can be used to obtain an estimate on the variance of the degrees in the graph and corresponding higher moments. In addition, we give some (negative) results on approximating the number of triangles and on approximating the number of length-3-paths in sublinear time.

[1]  Uri Alon,et al.  Kashtan, N., Itzkovitz, S., Milo, R. & Alon, U. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 1746-1758 , 2004 .

[2]  Artur Czumaj,et al.  Estimating the Weight of Metric Minimum Spanning Trees in Sublinear Time , 2009, SIAM J. Comput..

[3]  Dana Ron,et al.  On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms , 2007, Electron. Colloquium Comput. Complex..

[4]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[5]  Andreas Björklund,et al.  Counting Paths and Packings in Halves , 2009, ESA.

[6]  Bernard Chazelle,et al.  Approximating the Minimum Spanning Tree Weight in Sublinear Time , 2001, ICALP.

[7]  Krzysztof Onak,et al.  Constant-Time Approximation Algorithms via Local Improvements , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[8]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[9]  Ronitt Rubinfeld,et al.  Approximating the Weight of the Euclidean Minimum Spanning Tree in Sublinear Time , 2005, SIAM J. Comput..

[10]  Noga Alon,et al.  Biomolecular network motif counting and discovery by color coding , 2008, ISMB.

[11]  Süleyman Cenk Sahinalp,et al.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution , 2006, Systems Biology and Computational Proteomics.

[12]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[14]  Noga Alon,et al.  Balanced Hashing, Color Coding and Approximate Counting , 2009, IWPEC.

[15]  Vojtech Rödl,et al.  A Fast Approximation Algorithm for Computing the Frequencies of Subgraphs in a Given Graph , 1995, SIAM J. Comput..

[16]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[17]  Dana Ron,et al.  Approximating average parameters of graphs , 2008, Random Struct. Algorithms.

[18]  David Hales,et al.  Motifs in evolving cooperative networks look like protein structure networks , 2008, Networks Heterog. Media.

[19]  Omid Amini,et al.  Counting Subgraphs via Homomorphisms , 2009, SIAM J. Discret. Math..

[20]  Jörg Flum,et al.  The parameterized complexity of counting problems , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[21]  Noga Alon,et al.  Color-coding , 1995, JACM.

[22]  Noga Alon,et al.  Balanced Families of Perfect Hash Functions and Their Applications , 2007, ICALP.

[23]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[24]  Uriel Feige,et al.  On Sums of Independent Random Variables with Unbounded Variance and Estimating the Average Degree in a Graph , 2006, SIAM J. Comput..

[25]  Ryan Williams,et al.  Finding, minimizing, and counting weighted subgraphs , 2009, STOC '09.

[26]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[27]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[28]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[29]  Venkatesh Raman,et al.  Approximation Algorithms for Some Parameterized Counting Problems , 2002, ISAAC.

[30]  GusfieldDan Introduction to the IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2004 .

[31]  Dana Ron,et al.  Tight Bounds for Testing Bipartiteness in General Graphs , 2004, SIAM J. Comput..

[32]  Dana Ron,et al.  Property Testing in Bounded Degree Graphs , 1997, STOC.

[33]  Dana Ron,et al.  Comparing the strength of query types in property testing: the case of testing k-colorability , 2008, SODA '08.

[34]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..

[35]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[36]  Ryan Williams,et al.  Finding paths of length k in O*(2k) time , 2008, Inf. Process. Lett..

[37]  Yuval Shavitt,et al.  Approximating the Number of Network Motifs , 2009, Internet Math..

[38]  Dana Ron,et al.  Counting stars and other small subgraphs in sublinear time , 2010, SODA '10.

[39]  Yuichi Yoshida,et al.  An improved constant-time approximation algorithm for maximum~matchings , 2009, STOC '09.

[40]  Uriel Feige,et al.  On sums of independent random variables with unbounded variance, and estimating the average degree in a graph , 2004, STOC '04.

[41]  Ioannis Koutis,et al.  Faster Algebraic Algorithms for Path and Packing Problems , 2008, ICALP.