The Impact of Network Sampling on Relational Classification

Many real-world networks, such as the Internet, social networks, biological networks are massive in size, which difficult different processing and analysis tasks. For this reason, it is necessary to apply a sampling process to reduce the network size without losing relevant network information. In this paper, we propose a new and intuitive sampling method based on exploiting the following centrality measures: degree, k-core, clustering, eccentricity and structural holes. For our experiments, we delete 30% and 50% of the vertices from the original network and evaluate our proposal on six real-world networks on relational classification task using six different classifiers. Classification results achieved on sampled graphs generated from our proposal are similar to those obtained on the entire graphs. In most cases, our proposal reduced the original graphs by up to 50% of its original number of edges. Moreover, the execution time for learning step of the classifier is shorter on the sampled graph. keywords: network sampling, relational classification, centrality measures, complex networks

[1]  Albert-László Barabási,et al.  Scale-free networks , 2008, Scholarpedia.

[2]  Alneu de Andrade Lopes,et al.  Bipartite Graph for Topic Extraction , 2015, IJCAI.

[3]  Maria Cristina Ferreira de Oliveira,et al.  Music Genre Classification Using Traditional and Relational Approaches , 2014, 2014 Brazilian Conference on Intelligent Systems.

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[7]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[8]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[9]  Alneu de Andrade Lopes,et al.  A Multilevel Approach for Overlapping Community Detection , 2014, 2014 Brazilian Conference on Intelligent Systems.

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Alneu de Andrade Lopes,et al.  Exploiting behaviors of communities of twitter users for link prediction , 2013, Social Network Analysis and Mining.

[12]  Springer-Verlag Wien,et al.  Exploiting behaviors of communities of twitter users for link prediction , 2013 .

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[15]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Lilian Berton,et al.  Spreader Selection by Community to Maximize Information Diffusion in Social Networks , 2015, SIMBig.

[17]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[18]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[19]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[21]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[22]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[24]  Alneu de Andrade Lopes,et al.  Classification Based on the Optimal K-Associated Network , 2009, Complex.

[25]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[26]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[27]  Alneu de Andrade Lopes,et al.  Link Prediction in Online Social Networks Using Group Information , 2014, ICCSA.

[28]  Francisco Aparecido Rodrigues,et al.  Influence Maximization Based on the Least Influential Spreaders , 2015, SocInf@IJCAI.

[29]  Ramana Rao Kompella,et al.  Network Sampling Designs for Relational Classification , 2012, ICWSM.

[30]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.