Comparing Graph Sampling Methods Based on the Number of Queries

Random walk-based graph sampling methods can effectively estimate feature values in large-scale social networks wherein the node IDs are unknown. Real social networks are sampled by repeatedly querying their APIs to acquire the lists of adjacent nodes. These queries can then become a bottleneck in the sampling process because nearly all social network services restrict the rate at which queries can be issued. However, most existing graph sampling studies have not focused on the number of queries but have instead compared methods based on sample size. Therefore, such graph sampling methods cannot be recommended for estimating feature values in actual social networks. This study presents an approach to assess graph sampling methods that focus on the number of queries. This describes the time taken by algorithms for typical random walk-based graph sampling methods, such as a simple random walk with re-weighting (SRW-rw), a non-backtracking random walk with re-weighting (NBRW-rw), and a Metropolis ? Hastings random walk (MHRW), which require queries. The graph sampling precision was then experimentally evaluated based on sample size and query number standards using actual social networks, and the types of changes that occur were observed.

[1]  Walter Willinger,et al.  Respondent-Driven Sampling for Characterizing Unstructured Overlays , 2009, IEEE INFOCOM 2009.

[2]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[3]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[4]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.

[5]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[6]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[7]  Xin Xu,et al.  Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[8]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[9]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[10]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[11]  Kazuyuki Shudo,et al.  Estimating the Clustering Coefficient of a Social Network by a Non-backtracking Random Walk , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[12]  Donald F. Towsley,et al.  Improving Random Walk Estimation Accuracy with Uniform Restarts , 2010, WAW.

[13]  Silvio Lattanzi,et al.  On Sampling Nodes in a Network , 2016, WWW.

[14]  Athina Markopoulou,et al.  Towards Unbiased BFS Sampling , 2011, IEEE Journal on Selected Areas in Communications.

[15]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .