Design of Efficient Sampling Methods on Hybrid Social-Affiliation Networks

Graph sampling via crawling has become increasingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of random walks and can cause poor estimation accuracy. In this work, we observe that the graph under study, or called target graph, usually does not exist in isolation. In many situations, the target graph is related to an auxiliary graph and an affiliation graph, and the target graph becomes well connected when we view it from the perspective of these three graphs together, or called a hybrid social-affiliation graph in this paper. When directly sampling the target graph is difficult or inefficient, we can indirectly sample it efficiently with the assistances of the other two graphs. We design three sampling methods on such a hybrid social-affiliation network. Experiments conducted on both synthetic and real datasets demonstrate the effectiveness of our proposed methods.

[1]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[2]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[3]  Birnbaum Zw,et al.  Design of sample surveys to estimate the prevalence of rare diseases: three unbiased estimates. , 1965, Vital and health statistics. Series 2, Data evaluation and methods research.

[4]  Xue Liu,et al.  An efficient sampling method for characterizing points of interests on maps , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[6]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[7]  Krishna P. Gummadi,et al.  Defending against large-scale crawls in online social networks , 2012, CoNEXT '12.

[8]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[9]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[10]  Gautam Das,et al.  Faster random walks by rewiring online social networks on-the-fly , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  G. N. Gilbert Computational Social Science , 2010 .

[12]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[13]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[14]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[15]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[16]  Ting Zhu,et al.  Region sampling and estimation of geosocial data with dynamic range calibration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[17]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks: Algorithms and evaluation , 2006, Perform. Evaluation.

[18]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[19]  Sinan Aral,et al.  Identifying Influential and Susceptible Members of Social Networks , 2012, Science.

[20]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[21]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[22]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[23]  Donald F. Towsley,et al.  Sampling directed graphs with random walks , 2012, 2012 Proceedings IEEE INFOCOM.

[24]  Moritz Steiner,et al.  Dissecting foursquare venue popularity via random region sampling , 2012, CoNEXT Student '12.

[25]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[26]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[27]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.

[28]  D. Watts The “New” Science of Networks , 2004 .

[29]  Donald F. Towsley,et al.  Improving Random Walk Estimation Accuracy with Uniform Restarts , 2010, WAW.

[30]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[31]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.