Graph sampling: Estimation of degree distributions

Online social networks and the World Wide Web lead to large underlying graphs that might not be completely known because of their size. To compute reliable statistics, we have to resort to sampling the network. In this paper, we investigate four network sampling methods to estimate the network degree distribution and the so-called biased degree distribution of a 3.7 million wireless subscriber network. We measure the quality of our estimates of the degree distributions by using the Kolmogorov-Smirnov statistic. Among all four sampling methods, node sampling yields Pareto optimal sample sizes in terms of the Kolomogorov-Smirnov statistic for the degree distribution, while node-by-edge sampling yields optimal sample sizes for the biased distribution. We also find that random walk sampling performs better than the Metropolis-Hastings random walk.

[1]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[2]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[3]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[4]  Eric D. Kolaczyk Sampling and Estimation in Network Graphs , 2009 .

[5]  Carsten Wiuf,et al.  Sampling properties of random graphs: the degree distribution. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Minas Gjoka,et al.  Walking on a graph with a magnifying glass: stratified sampling via weighted random walks , 2011, PERV.

[7]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[8]  I. Grosse,et al.  Analysis of symbolic sequences using the Jensen-Shannon divergence. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Athina Markopoulou,et al.  On the bias of BFS , 2010, ArXiv.

[10]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[11]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[12]  Walter Willinger,et al.  Sampling Techniques for Large, Dynamic Graphs , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[13]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[14]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..