Preserving Minority Structures in Graph Sampling

Sampling is a widely used graph reduction technique to accelerate graph computations and simplify graph visualizations. By comprehensively analyzing the literature on graph sampling, we assume that existing algorithms cannot effectively preserve minority structures that are rare and small in a graph but are very important in graph analysis. In this work, we initially conduct a pilot user study to investigate representative minority structures that are most appealing to human viewers. We then perform an experimental study to evaluate the performance of existing graph sampling algorithms regarding minority structure preservation. Results confirm our assumption and suggest key points for designing a new graph sampling approach named mino-centric graph sampling (MCGS). In this approach, a triangle-based algorithm and a cut-point-based algorithm are proposed to efficiently identify minority structures. A set of importance assessment criteria are designed to guide the preservation of important minority structures. Three optimization objectives are introduced into a greedy strategy to balance the preservation between minority and majority structures and suppress the generation of new minority structures. A series of experiments and case studies are conducted to evaluate the effectiveness of the proposed MCGS.

[1]  Christian Doerr,et al.  Metric convergence in social network sampling , 2013, HotPlanet '13.

[2]  Balázs Kégl,et al.  An apple-to-apple comparison of Learning-to-rank algorithms in terms of Normalized Discounted Cumulative Gain , 2012, ECAI 2012.

[3]  Ivan Viola,et al.  Exploring visual attention and saliency modeling for task-based visual analysis , 2018, Comput. Graph..

[4]  Pili Hu,et al.  Accelerating graph mining algorithms via uniform random edge sampling , 2016, 2016 IEEE International Conference on Communications (ICC).

[5]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[6]  Peter J. Stuckey,et al.  Exploration of Networks using overview+detail with Constraint-based cooperative layout , 2008, IEEE Transactions on Visualization and Computer Graphics.

[7]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[8]  Zhiguang Zhou,et al.  Visual Abstraction of Large Scale Geospatial Origin-Destination Movement Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  S. Berg Snowball Sampling—I , 2006 .

[10]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[11]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[12]  Michalis Faloutsos,et al.  Sampling Internet Topologies: How Small Can We Go? , 2003, International Conference on Internet Computing.

[13]  Niklas Elmqvist,et al.  Visual Analytics for Multimodal Social Network Analysis: A Design Study with Social Scientists , 2013, IEEE Transactions on Visualization and Computer Graphics.

[14]  Christophe Hurter,et al.  Skeleton-Based Edge Bundling for Graph Visualization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[15]  G. Chiandussi,et al.  Comparison of multi-objective optimization methodologies for engineering applications , 2012, Comput. Math. Appl..

[16]  Pak Chung Wong,et al.  A Visual Evaluation Study of Graph Sampling Techniques , 2017, Visualization and Data Analysis.

[17]  Marc Streit Interactive Visualization of Complex Graphs , 2007 .

[18]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Hans-Jörg Schulz,et al.  Honeycomb: Visual Analysis of Large Scale Social Networks , 2009, INTERACT.

[22]  Yasuhiro Fujiwara,et al.  SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs , 2015, Proc. VLDB Endow..

[23]  Wei Chen,et al.  EOD Edge Sampling for Visualizing Dynamic Network via Massive Sequence View , 2018, IEEE Access.

[24]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[25]  Michael Burch,et al.  A dynamic graph visualization perspective on eye movement data , 2014, ETRA.

[26]  Feng Luo,et al.  Evaluating Multi-Dimensional Visualizations for Understanding Fuzzy Clusters , 2019, IEEE Transactions on Visualization and Computer Graphics.

[27]  Nan Cao,et al.  Evaluation of Graph Sampling: A Visualization Perspective , 2017, IEEE Transactions on Visualization and Computer Graphics.

[28]  Ross Maciejewski,et al.  The Perception of Graph Properties in Graph Layouts , 2018, Comput. Graph. Forum.

[29]  Matthew O. Ward,et al.  Introduction to Multivariate Network Visualization , 2013, Multivariate Network Visualization.

[30]  Shengli Wu Evaluation of Retrieval Results , 2012 .

[31]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[32]  Hsinchun Chen,et al.  Untangling Criminal Networks: A Case Study , 2003, ISI.

[33]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[34]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[35]  Walter Willinger,et al.  On Unbiased Sampling for Unstructured Peer-to-Peer Networks , 2006, IEEE/ACM Transactions on Networking.

[36]  Peter J. Haas,et al.  Sampling for Scalable Visual Analytics , 2017, IEEE Computer Graphics and Applications.

[37]  Ying Zhao,et al.  A survey of visualization for smart manufacturing , 2018, Journal of Visualization.

[38]  Kevin Duh,et al.  Learning to Translate with Multiple Objectives , 2012, ACL.

[39]  Mohammad Reza Meybodi,et al.  Social network sampling using spanning trees , 2016 .

[40]  David S. Ebert,et al.  Visual Analytics of User Influence and Location-Based Social Networks , 2015 .

[41]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[42]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[43]  Peter Eades,et al.  Proxy Graph: Visual Quality Metrics of Big Graph Sampling , 2017, IEEE Transactions on Visualization and Computer Graphics.

[44]  Bongshin Lee,et al.  Information Visualization Evaluation Using Crowdsourcing , 2018, Comput. Graph. Forum.

[45]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[46]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[47]  Jiwon Hong,et al.  A community-based sampling method using DPL for online social networks , 2011, Inf. Sci..

[48]  Athina Markopoulou,et al.  Towards Unbiased BFS Sampling , 2011, IEEE Journal on Selected Areas in Communications.

[49]  ANNE MARSDEN,et al.  EIGENVALUES OF THE LAPLACIAN AND THEIR RELATIONSHIP TO THE CONNECTEDNESS , 2013 .

[50]  Pili Hu,et al.  A Survey and Taxonomy of Graph Sampling , 2013, ArXiv.

[51]  Marek Chrobak,et al.  Reducing Large Internet Topologies for Faster Simulations , 2005, NETWORKING.

[52]  Le Song,et al.  Visualisation and Analysis of Large and Complex Scale-free Networks , 2005, EuroVis.

[53]  Huamin Qu,et al.  Interactive visual summary of major communities in a large network , 2015, 2015 IEEE Pacific Visualization Symposium (PacificVis).

[54]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[55]  Marko Bajec,et al.  Assessing the effectiveness of real-world network simplification , 2014, ArXiv.

[56]  Jingrui He,et al.  Discovering rare categories from graph streams , 2016, Data Mining and Knowledge Discovery.

[57]  Bettina Speckmann,et al.  Vulnerability in Social Epistemic Networks , 2020, International Journal of Philosophical Studies.

[58]  Carsten Wiuf,et al.  Sampling properties of random graphs: the degree distribution. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Peter Eades,et al.  Shape-Based Quality Metrics for Large Graph Visualization , 2015, J. Graph Algorithms Appl..

[60]  Xin Zhang,et al.  PIWI: Visually Exploring Graphs Based on Their Community Structure , 2013, IEEE Transactions on Visualization and Computer Graphics.

[61]  Ramana Rao Kompella,et al.  Network Sampling via Edge-based Node Selection with Graph Induction , 2011 .

[62]  Daniel J. Nordman,et al.  An interactive graphical method for community detection in network data , 2016, Computational Statistics.

[63]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[64]  András Pluhár,et al.  Community Detection by using the Extended Modularity , 2011, Acta Cybern..

[65]  Xiaolong Zhang,et al.  Structure-Based Suggestive Exploration: A New Approach for Effective Exploration of Large Networks , 2019, IEEE Transactions on Visualization and Computer Graphics.

[66]  Anthony K. H. Tung,et al.  LDSScanner: Exploratory Analysis of Low-Dimensional Structures in High-Dimensional Datasets , 2018, IEEE Transactions on Visualization and Computer Graphics.

[67]  Long Jin,et al.  Understanding Graph Sampling Algorithms for Social Network Analysis , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[68]  Mohammad Reza Meybodi,et al.  Sampling from complex networks using distributed learning automata , 2014 .

[69]  Mohammad Ali Nematbakhsh,et al.  IMPROVING DETECTION OF INFLUENTIAL NODES IN COMPLEX NETWORKS , 2015, ArXiv.

[70]  Michael J. Fischer,et al.  An improved equivalence algorithm , 1964, CACM.

[71]  David A. Bader,et al.  Graph Algorithms , 2011, Encyclopedia of Parallel Computing.

[72]  Pili Hu,et al.  Graph Property Preservation under Community-Based Sampling , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[73]  Ben Shneiderman,et al.  Motif simplification: improving network visualization readability with fan, connector, and clique glyphs , 2013, CHI.

[74]  Lawrence B. Holder,et al.  Frequent subgraph mining on a single large graph using sampling techniques , 2010, MLG '10.

[75]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[76]  Qi Gao,et al.  An improved sampling method of complex network , 2014 .

[77]  Wei Chen,et al.  Evaluating Perceptual Bias During Geometric Scaling of Scatterplots , 2019, IEEE Transactions on Visualization and Computer Graphics.

[78]  Jeffrey Xu Yu,et al.  On random walk based graph sampling , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[79]  Mohammad Reza Meybodi,et al.  Sampling social networks using shortest paths , 2015 .

[80]  Horst D. Simon,et al.  Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems , 1994, Concurr. Pract. Exp..

[81]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[82]  Mario Albrecht,et al.  On Open Problems in Biological Network Visualization , 2009, GD.

[83]  Panos M. Pardalos,et al.  Detecting critical node structures on graphs: A mathematical programming approach , 2018, Networks.

[84]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[85]  Stephan Diehl,et al.  Exploring the Limits of Complexity: A Survey of Empirical Studies on Graph Visualisation , 2018, Vis. Informatics.

[86]  Ljupco Kocarev,et al.  Graphlet characteristics in directed networks , 2016, Scientific Reports.

[87]  Xiaolin Du,et al.  SGP: a social network sampling method based on graph partition , 2019 .