A two-phase sampling algorithm for social networks

In recent years, the data used for analysis of social networks become very huge and restrictive so that it can be used an appropriate and small sampled network of original network for analysis goals. Sampling social network is referred to collect a small subgraph of original network with high property similarities between them. Due to important impact of sampling on the social network analyses, many algorithms have been proposed in the field of network sampling. In this paper, we propose a two-phase algorithm for sampling online social networks. At first phase, our algorithm iteratively constructs several set of minimum spanning trees (MST) of network. In the second phase, the proposed algorithm sorts vertices of MSTs and merge them to form a sampled network. Several simulation experiments are conducted to examine the performance of the proposed algorithm on different networks. The obtained results are compared with counterpart algorithms in terms of KS-test and ND-test. From the results, it can be observed that the proposed algorithm outperforms the existing algorithms.

[1]  Tanya Y. Berger-Wolf,et al.  Benefits of bias: towards better characterization of network sampling , 2011, KDD.

[2]  Mohammad Reza Meybodi,et al.  Finding Maximum Clique in Stochastic Graphs Using Distributed Learning Automata , 2015, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Walter Willinger,et al.  Respondent-Driven Sampling for Characterizing Unstructured Overlays , 2009, IEEE INFOCOM 2009.

[4]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[5]  Mohammad Reza Meybodi,et al.  Sampling from complex networks using distributed learning automata , 2014 .

[6]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[7]  Erik M. Volz,et al.  Probability based estimation theory for respondent driven sampling , 2008 .

[8]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[9]  Panayiotis Zaphiris,et al.  Age differences in online social networking - A study of user profiles and the social capital divide among teenagers and older users in MySpace , 2009, Comput. Hum. Behav..

[10]  Javad Akbari Torkestani Degree constrained minimum spanning tree problem: a learning automata approach , 2012, The Journal of Supercomputing.

[11]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[12]  Mark Huisman,et al.  Imputation of missing network data: Some simple procedures , 2009, J. Soc. Struct..

[13]  Mohammad Reza Meybodi,et al.  Sampling social networks using shortest paths , 2015 .

[14]  Nick Koudas,et al.  Sampling Online Social Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[16]  Donald F. Towsley,et al.  On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling , 2012, IEEE Journal on Selected Areas in Communications.

[17]  Athina Markopoulou,et al.  Towards Unbiased BFS Sampling , 2011, IEEE Journal on Selected Areas in Communications.

[18]  Ramana Rao Kompella,et al.  Time-based sampling of social network activity graphs , 2010, MLG '10.