A tale of three graphs: Sampling design on hybrid social-affiliation networks

Random walk-based graph sampling methods have become increasingly popular and important for characterizing large-scale complex networks. While powerful, they are known to exhibit problems when the graph is loosely connected, which slows down the convergence of a random walk and can result in poor estimation accuracy. In this work, we observe that many graphs under study, called target graphs, usually do not exist in isolation. In many situations, a target graph is often related to an auxiliary graph and an affiliation graph, and the target graph becomes better connected when viewed from these three graphs as a whole, or what we called a hybrid social-affiliation network. This viewpoint brings extra benefits to the graph sampling framework, e.g., when directly sampling a target graph is difficult or inefficient, we can efficiently sample it with the assistance of auxiliary and affiliation graphs. We propose three sampling methods on such a hybrid social-affiliation network to estimate target graph characteristics, and conduct extensive experiments on both synthetic and real datasets, to demonstrate the effectiveness of these new sampling methods.

[1]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[2]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[3]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[4]  Birnbaum Zw,et al.  Design of sample surveys to estimate the prevalence of rare diseases: three unbiased estimates. , 1965, Vital and health statistics. Series 2, Data evaluation and methods research.

[5]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[6]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[7]  Gautam Das,et al.  Faster random walks by rewiring online social networks on-the-fly , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[8]  Moritz Steiner,et al.  Dissecting foursquare venue popularity via random region sampling , 2012, CoNEXT Student '12.

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[11]  Donald F. Towsley,et al.  Sampling directed graphs with random walks , 2012, 2012 Proceedings IEEE INFOCOM.

[12]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[13]  Ting Zhu,et al.  Region sampling and estimation of geosocial data with dynamic range calibration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks: Algorithms and evaluation , 2006, Perform. Evaluation.

[15]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[16]  Sinan Aral,et al.  Identifying Influential and Susceptible Members of Social Networks , 2012, Science.

[17]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[18]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[19]  Xin Xu,et al.  Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[20]  Donald F. Towsley,et al.  Efficiently Estimating Motif Statistics of Large Networks , 2013, TKDD.

[21]  Krishna P. Gummadi,et al.  Defending against large-scale crawls in online social networks , 2012, CoNEXT '12.

[22]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[23]  Lada A. Adamic,et al.  Computational Social Science , 2009, Science.

[24]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[25]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.

[26]  D. Watts The “New” Science of Networks , 2004 .

[27]  Donald F. Towsley,et al.  Improving Random Walk Estimation Accuracy with Uniform Restarts , 2010, WAW.

[28]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[29]  Xue Liu,et al.  An efficient sampling method for characterizing points of interests on maps , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[30]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[31]  Xin Xu,et al.  A general framework of hybrid graph sampling for complex network analysis , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[32]  Z W Birnbaum,et al.  Design of sample surveys to estimate the prevalence of rare diseases: three unbiased estimates. , 1965, Vital and health statistics. Series 2, Data evaluation and methods research.