Sampling online social networks by random walk with indirect jumps

Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected two-layered network structure. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.

[1]  Jon M. Kleinberg,et al.  Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on facebook , 2013, CSCW.

[2]  Gautam Das,et al.  Leveraging History for Faster Sampling of Online Social Networks , 2015, Proc. VLDB Endow..

[3]  Johan Pouwelse,et al.  Understanding user behavior in Spotify , 2013, 2013 Proceedings IEEE INFOCOM.

[4]  Xin Xu,et al.  On the rao-blackwellization and its application for graph sampling via neighborhood exploration , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[5]  Xue Liu,et al.  An efficient sampling method for characterizing points of interests on maps , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[7]  Donald F. Towsley,et al.  A tale of three graphs: Sampling design on hybrid social-affiliation networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Donald F. Towsley,et al.  Improving Random Walk Estimation Accuracy with Uniform Restarts , 2010, WAW.

[9]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[10]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[11]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[12]  Moritz Steiner,et al.  Dissecting foursquare venue popularity via random region sampling , 2012, CoNEXT Student '12.

[13]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[14]  Donald F. Towsley,et al.  Sampling directed graphs with random walks , 2012, 2012 Proceedings IEEE INFOCOM.

[15]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Gautam Das,et al.  Faster random walks by rewiring online social networks on-the-fly , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[17]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[18]  Krishna P. Gummadi,et al.  Defending against large-scale crawls in online social networks , 2012, CoNEXT '12.

[19]  Ting Zhu,et al.  Region sampling and estimation of geosocial data with dynamic range calibration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[20]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[21]  Yanghee Choi,et al.  Collecting, organizing, and sharing pins in pinterest: interest-driven or social-driven? , 2014, SIGMETRICS '14.

[22]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[23]  Xuanzhe Liu,et al.  Voting with Their Feet: Inferring User Preferences from App Management Activities , 2016, WWW.

[24]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, International Workshop on Graph-Theoretic Concepts in Computer Science.

[25]  Andrei Z. Broder,et al.  Workshop on Algorithms and Models for the Web Graph , 2007, WAW.

[26]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[27]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[28]  Birnbaum Zw,et al.  Design of sample surveys to estimate the prevalence of rare diseases: three unbiased estimates. , 1965, Vital and health statistics. Series 2, Data evaluation and methods research.

[29]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks: Algorithms and evaluation , 2006, Perform. Evaluation.

[30]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[31]  Xin Xu,et al.  Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[32]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[33]  Xin Xu,et al.  A general framework of hybrid graph sampling for complex network analysis , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[34]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.