A Visual and Statistical Benchmark for Graph Sampling Methods

Effectively visualizing large graphs is challenging. Capturing the statistical properties of these large graphs is also difficult. Sampling algorithms, developed to more feasibly observe and analyze large graphs, are indispensable for this task. Many sampling approaches for graph simplification have been proposed. These methods can be grouped into three categories: node sampling, edge sampling, and traversal-based sampling. It is still an open question, however, which single sampling technique produces the best representative sample. The goal of this paper is to evaluate commonly used sampling methods through a combined visual and statistical comparison. Initial results indicate that the effectiveness of a sampling method is dependent on the type of graph, the size of the graph, and the desired statistical property. The benchmark can be used as a guideline in choosing the proper method for a particular graph sampling task. The resulting benchmark can be incorporated into graph visualization and analysis tools.

[1]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[3]  John Q. Walker,et al.  A node‐positioning algorithm for general trees , 1990, Softw. Pract. Exp..

[4]  Pili Hu,et al.  A Survey and Taxonomy of Graph Sampling , 2013, ArXiv.

[5]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[6]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[7]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[8]  Soon-Hyung Yook,et al.  Statistical properties of sampled networks by random walks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[10]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[11]  Kwan-Liu Ma,et al.  Rapid Graph Layout Using Space Filling Curves , 2008, IEEE Transactions on Visualization and Computer Graphics.

[12]  Jiawei Han,et al.  Mining scale-free networks using geodesic clustering , 2004, KDD.

[13]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[14]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[15]  Kwan-Liu Ma,et al.  Clustering, Visualizing, and Navigating for Large Dynamic Graphs , 2012, GD.

[16]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.