Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection

We revisit the classic problem of estimating the degree distribution moments of an undirected graph. Consider an undirected graph $G=(V,E)$ with $n$ vertices, and define (for $s > 0$) $\mu_s = \frac{1}{n}\cdot\sum_{v \in V} d^s_v$. Our aim is to estimate $\mu_s$ within a multiplicative error of $(1+\epsilon)$ (for a given approximation parameter $\epsilon>0$) in sublinear time. We consider the sparse graph model that allows access to: uniform random vertices, queries for the degree of any vertex, and queries for a neighbor of any vertex. For the case of $s=1$ (the average degree), $\widetilde{O}(\sqrt{n})$ queries suffice for any constant $\epsilon$ (Feige, SICOMP 06 and Goldreich-Ron, RSA 08). Gonen-Ron-Shavitt (SIDMA 11) extended this result to all integral $s > 0$, by designing an algorithms that performs $\widetilde{O}(n^{1-1/(s+1)})$ queries. We design a new, significantly simpler algorithm for this problem. In the worst-case, it exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof. More importantly, the running time of this algorithm is connected to the degeneracy of $G$. This is (essentially) the maximum density of an induced subgraph. For the family of graphs with degeneracy at most $\alpha$, it has a query complexity of $\widetilde{O}\left(\frac{n^{1-1/s}}{\mu^{1/s}_s} \Big(\alpha^{1/s} + \min\{\alpha,\mu^{1/s}_s\}\Big)\right) = \widetilde{O}(n^{1-1/s}\alpha/\mu^{1/s}_s)$. Thus, for the class of bounded degeneracy graphs (which includes all minor closed families and preferential attachment graphs), we can estimate the average degree in $\widetilde{O}(1)$ queries, and can estimate the variance of the degree distribution in $\widetilde{O}(\sqrt{n})$ queries. This is a major improvement over the previous worst-case bounds. Our key insight is in designing an estimator for $\mu_s$ that has low variance when $G$ does not have large dense subgraphs.

[1]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[2]  Andrew McGregor,et al.  Catching the Head, Tail, and Everything in Between: A Streaming Algorithm for the Degree Distribution , 2015, 2015 IEEE International Conference on Data Mining.

[3]  Dana Ron,et al.  On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms , 2007, Electron. Colloquium Comput. Complex..

[4]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[5]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[6]  Bernard Chazelle,et al.  Approximating the Minimum Spanning Tree Weight in Sublinear Time , 2001, ICALP.

[7]  Dana Ron,et al.  Approximating average parameters of graphs , 2008, Random Struct. Algorithms.

[8]  Dana Ron,et al.  Counting stars and other small subgraphs in sublinear time , 2010, SODA '10.

[9]  Yuichi Yoshida,et al.  An improved constant-time approximation algorithm for maximum~matchings , 2009, STOC '09.

[10]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[11]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2017, SIAM J. Comput..

[12]  Christos Faloutsos,et al.  The "DGX" distribution for mining massive, skewed data , 2001, KDD '01.

[13]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[14]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[15]  Piotr Sankowski,et al.  Algorithmic Complexity of Power Law Networks , 2015, SODA.

[16]  Dana Ron,et al.  Approximating the distance to properties in bounded-degree and general sparse graphs , 2009, TALG.

[17]  Cynthia A. Phillips,et al.  Why do simple algorithms for triangle enumeration work in the real world? , 2014, Internet Math..

[18]  Krzysztof Onak,et al.  A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size , 2011, SODA.

[19]  Krzysztof Onak,et al.  Constant-Time Approximation Algorithms via Local Improvements , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[20]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[21]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[22]  Anirban Dasgupta,et al.  On estimating the average degree , 2014, WWW.

[23]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[25]  Ronitt Rubinfeld,et al.  Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling , 2017, Algorithmica.

[26]  Ronitt Rubinfeld,et al.  Sublinear-Time Algorithms for Counting Star Subgraphs with Applications to Join Selectivity Estimation , 2016, ArXiv.

[27]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[28]  Artur Czumaj,et al.  Estimating the Weight of Metric Minimum Spanning Trees in Sublinear Time , 2009, SIAM J. Comput..

[29]  Krzysztof Onak,et al.  Local Graph Partitions for Approximation and Testing , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[30]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[31]  Uriel Feige,et al.  On sums of independent random variables with unbounded variance, and estimating the average degree in a graph , 2004, STOC '04.

[32]  Noga Alon,et al.  Linear Time Algorithms for Finding a Dominating Set of Fixed Size in Degenerated Graphs , 2007, Algorithmica.

[33]  Ronitt Rubinfeld,et al.  Approximating the Weight of the Euclidean Minimum Spanning Tree in Sublinear Time , 2005, SIAM J. Comput..