How Large Is Your Graph?

We consider the problem of estimating the graph size, where one is given only local access to the graph. We formally define a query model in which one starts with a seed node and is allowed to make queries about neighbours of nodes that have already been seen. In the case of undirected graphs, an estimator of Katzir et al. (2014) based on a sample from the stationary distribution pi uses O(1/||pi||_2 + d_avg) queries; we prove that this is tight. In addition, we establish this as a lower bound even when the algorithm is allowed to crawl the graph arbitrarily; the results of Katzir et al. give an upper bound that is worse by a multiplicative factor t_mix(1/n^4). The picture becomes significantly different in the case of directed graphs. We show that without strong assumptions on the graph structure, the number of nodes cannot be predicted to within a constant multiplicative factor without using a number of queries that are at least linear in the number of nodes; in particular, rapid mixing and small diameter, properties that most real-world networks exhibit, do not suffice. The question of interest is whether any algorithm can beat breadth-first search. We introduce a new parameter, generalising the well-studied conductance, such that if a suitable bound on it exists and is known to the algorithm, the number of queries required is sublinear in the number of edges; we show that this is tight.

[1]  Amin Saberi,et al.  Random Walks with Lookahead on Power Law Random Graphs , 2006, Internet Math..

[2]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[3]  Michael Kearns,et al.  Local Algorithms for Finding Interesting Individuals in Large Networks , 2010, ICS.

[4]  Alan M. Frieze,et al.  Crawling on web graphs , 2002, STOC '02.

[5]  Gregory Valiant,et al.  Estimating the Unseen , 2017, J. ACM.

[6]  Nancy A. Lynch,et al.  Ant-Inspired Density Estimation via Random Walks: Extended Abstract , 2016, PODC.

[7]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[8]  Oded Goldreich,et al.  Introduction to Testing Graph Properties , 2010, Property Testing.

[9]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[10]  Pan Peng,et al.  Relating two property testing models for bounded degree directed graphs , 2016, STOC.

[11]  Michael A. Bender,et al.  Testing properties of directed graphs: acyclicity and connectivity , 2002, Random Struct. Algorithms.

[12]  Anirban Dasgupta,et al.  On estimating the average degree , 2014, WWW.

[13]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[14]  Leonard Pitt A Note on Extending Knuth's Tree Estimator to Directed Acyclic Graphs , 1987, Inf. Process. Lett..

[15]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.

[16]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[17]  D. Knuth Estimating the efficiency of backtrack programs. , 1974 .

[18]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[19]  Howard G. Tucker,et al.  Confidence intervals for the number of unseen types , 1998 .

[20]  Colin Cooper,et al.  Estimating network parameters using random walks , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).