A community-based sampling method using DPL for online social networks

In this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample graph should reflect the topology of the original graph. Third, sample graphs should be consistent with each other when they are sampled from the same original graph. The proposed method employs two techniques: hierarchical community extraction and densification power law. The proposed method partitions the original graph into a set of communities to preserve the topology of the original graph. It also uses the densification power law which captures the ratio between the number of nodes and the number of edges in online social networks. In experiments, we use several real-world online social networks, create sample graphs using the existing methods and ours, and analyze the differences between the sample graph by each sampling method and the original graph.

[1]  Sunju Park,et al.  Extraction of a latent blog community based on subject , 2009, CIKM.

[2]  Sunju Park,et al.  Determining Content Power Users in a Blog Network: An Approach and Its Applications , 2011, IEEE Trans. Syst. Man Cybern. Part A.

[3]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[4]  Athanasios V. Vasilakos,et al.  Revealing the efficiency of information diffusion in online social networks of microblog , 2015, Inf. Sci..

[5]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[6]  Rafal Drezewski,et al.  The application of social network analysis algorithms in a system supporting money laundering detection , 2015, Inf. Sci..

[7]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[9]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[10]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[13]  Christos Faloutsos,et al.  On Constructing Seminal Paper Genealogy , 2014, IEEE Transactions on Cybernetics.

[14]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[15]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[16]  Edo Liberty,et al.  Estimating sizes of social networks via biased sampling , 2011, Internet Math..

[17]  Christos Faloutsos,et al.  An analysis on information diffusion through BlogCast in a blogosphere , 2015, Inf. Sci..

[18]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[19]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[20]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[23]  Seok-Ho Yoon,et al.  Link-Based Similarity Measures Using Reachability Vectors , 2014, TheScientificWorldJournal.

[24]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[25]  Leting Wu,et al.  A Spectrum-Based Framework for Quantifying Randomness of Social Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[27]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Martin H. Levinson Linked: The New Science of Networks , 2004 .

[29]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[30]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[31]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[32]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[33]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Marek Chrobak,et al.  Sampling large Internet topologies for simulation purposes , 2007, Comput. Networks.

[35]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[36]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Elke Achtert,et al.  Evaluation of Clusterings -- Metrics and Visual Support , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[38]  Yung-Ming Li,et al.  Recommending social network applications via social filtering mechanisms , 2013, Inf. Sci..

[39]  Matthew Richardson,et al.  Trust Management for the Semantic Web , 2003, SEMWEB.

[40]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[41]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[42]  Ömer Egecioglu,et al.  Anónimos: An LP-Based Approach for Anonymizing Weighted Social Network Graphs , 2010, IEEE Transactions on Knowledge and Data Engineering.