Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time

We consider the problem of sampling and approximately counting an arbitrary given motif H in a graph G, where access to G is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of H into a collection of odd cycles and stars, denoted D∗(H) = {Ok1 , ..., Okq , Sp1 , ..., Sp`}. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling and approximately counting arbitrary motifs which, up to poly(logn) factors, is always at least as good as previous results, and for most graphs G is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifsH with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.

[1]  Jure Leskovec,et al.  Motifs in Temporal Networks , 2016, WSDM.

[2]  Ronitt Rubinfeld,et al.  Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling , 2017, Algorithmica.

[3]  Dana Ron,et al.  On approximating the number of k-cliques in sublinear time , 2017, STOC.

[4]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[5]  Sepehr Assadi,et al.  A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling , 2018, ITCS.

[6]  Mikkel Thorup,et al.  Sampling and Counting Edges via Vertex Accesses , 2021, ArXiv.

[7]  Dana Ron,et al.  Faster sublinear approximation of the number of k-cliques in low-arboricity graphs , 2020, SODA.

[8]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[9]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Dana Ron,et al.  Approximating average parameters of graphs , 2008, Random Struct. Algorithms.

[11]  Virginia Vassilevska Williams,et al.  Efficient algorithms for clique problems , 2009, Inf. Process. Lett..

[12]  Andreas Björklund,et al.  Counting Paths and Packings in Halves , 2009, ESA.

[13]  John J Tyson,et al.  Functional motifs in biochemical reaction networks. , 2010, Annual review of physical chemistry.

[14]  Qi He,et al.  Communication motifs: a tool to characterize social communications , 2010, CIKM.

[15]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[16]  W. Lim,et al.  Defining Network Topologies that Can Achieve Biochemical Adaptation , 2009, Cell.

[17]  Seshadhri Comandur,et al.  Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six , 2019, ITCS.

[18]  Nicola J. Rinaldi,et al.  Control of Pancreas and Liver Gene Expression by HNF Transcription Factors , 2004, Science.

[19]  Shane T. Jensen,et al.  The Program of Gene Transcription for a Single Differentiating Cell Type during Sporulation in Bacillus subtilis , 2004, PLoS biology.

[20]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[21]  Xi Chen,et al.  Nearly optimal edge estimation with independent set queries , 2020, SODA.

[22]  Dana Ron,et al.  Counting stars and other small subgraphs in sublinear time , 2010, SODA '10.

[23]  Tim Roughgarden,et al.  Finding Cliques in Social Networks: A New Distribution-Free Model , 2018, ICALP.

[24]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[25]  Will Rosenbaum,et al.  Lower Bounds for Approximating Graph Parameters via Communication Complexity , 2017, APPROX-RANDOM.

[26]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Cyrus Rashtchian,et al.  Edge Estimation with Independent Set Oracles , 2017, ITCS.

[28]  Will Rosenbaum,et al.  On Sampling Edges Almost Uniformly , 2017, SOSA.

[29]  Dana Ron,et al.  The Arboricity Captures the Complexity of Sampling Edges , 2019, ICALP.

[30]  Pan Peng,et al.  Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time , 2020, ICALP.

[31]  Jakub Tvetek Approximate Triangle Counting via Sampling and Fast Matrix Multiplication , 2021 .

[32]  Dana Ron,et al.  Tight Bounds for Testing Bipartiteness in General Graphs , 2004, RANDOM-APPROX.

[33]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[34]  Dana Ron,et al.  A pr 2 01 6 Sublinear Time Estimation of Degree Distribution Moments : The Arboricity Connection ( Full Version ) Talya Eden , 2016 .

[35]  Mihai Udrescu,et al.  Uncovering the fingerprint of online social networks using a network motif based approach , 2016, Comput. Commun..

[36]  Will Rosenbaum,et al.  Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques: Sampling Cliques is Harder Than Counting , 2020, ArXiv.

[37]  Katarzyna Musial,et al.  Local Topology of Social Network Based on Motif Analysis , 2008, KES.

[38]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[39]  James R. Johnson,et al.  Oscillations in NF-κB Signaling Control the Dynamics of Gene Expression , 2004, Science.

[40]  Uriel Feige,et al.  On sums of independent random variables with unbounded variance, and estimating the average degree in a graph , 2004, STOC '04.

[41]  Maximilien Danisch,et al.  Listing k-cliques in Sparse Real-World Graphs* , 2018, WWW.