Sampling Heterogeneous Networks

Online social networks are mainly characterized by large-scale and heterogeneous semantic relationships. Unfortunately, for online social network services such as Facebook or Twitter, it is very difficult to obtain the fully observed network without privilege to access the data internally. To address the above needs, social network sampling is a means that aims at identifying a representative subgraph that preserves certain properties of the network, given the information of any instance in the network is unknown before being sampled. This study tackles heterogeneous network sampling by considering the conditional dependency of node types and link types, where we design a property, Relational Profile, to account such characterization. We further propose a sampling method to preserve this property. Lastly, we propose to evaluate our model from three different angles. First, we show that the proposed sampling method can more faithfully preserve the Relational Profile. Second, we evaluate the usefulness of the Relational Profile showing such information is beneficial for link prediction tasks. Finally, we evaluate whether the networks sampled by our method can be used to train more accurate prediction models comparing to networks produced by other methods.

[1]  Tanya Y. Berger-Wolf,et al.  Benefits of bias: towards better characterization of network sampling , 2011, KDD.

[2]  Marek Chrobak,et al.  Reducing Large Internet Topologies for Faster Simulations , 2005, NETWORKING.

[3]  Nitesh V. Chawla,et al.  Predicting Links in Multi-relational and Heterogeneous Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[4]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[5]  Mi-Yen Yeh,et al.  On Sampling Type Distribution from Heterogeneous Social Networks , 2011, PAKDD.

[6]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[7]  Walter Willinger,et al.  On unbiased sampling for unstructured peer-to-peer networks , 2009, TNET.

[8]  Ramana Rao Kompella,et al.  Time-based sampling of social network activity graphs , 2010, MLG '10.

[9]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Shyhtsun Felix Wu,et al.  Crawling Online Social Graphs , 2010, 2010 12th International Asia-Pacific Web Conference.

[12]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[13]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[14]  Edward A. Fox,et al.  Link fusion: a unified link analysis framework for multi-type interrelated data objects , 2004, WWW '04.

[15]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[17]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[18]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[20]  Minas Gjoka,et al.  Walking on a graph with a magnifying glass: stratified sampling via weighted random walks , 2011, PERV.