A Sampling-Based Tool for Scaling Graph Datasets

Graph processing has become a topic of interest in many domains. However, we still observe a lack of representative datasets for in-depth performance and scalability analysis. Neither data collections, nor graph generators provide enough diversity and control for thorough analysis. To address this problem, we proposea heuristic method for scaling existing graphs. Our approach, based onsampling andinterconnection, can provide a scaled "version" of a given graph. Moreover, we provide analytical models to predict the topological properties of the scaled graphs (such as the diameter, degree distribution, density, or the clustering coefficient), and further enable the user to tweak these properties. Property control is achieved through a portfolio of graph interconnection methods (e.g., star, ring, chain, fully connected) applied for combining the graph samples. We further implement our method as an open-source tool which can be used to quickly provide families of datasets for in-depth benchmarking of graph processing algorithms. Our empirical evaluation demonstrates our tool provides scaled graphs of a wide range of sizes, whose properties match well with model predictions and/or user requirements. Finally, we also illustrate, through a case-study, how scaled graphs can be used for in-depth performance analysis of graph processing algorithms.

[1]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[2]  Alexandru Iosup,et al.  Benchmarking graph-processing platforms: a vision , 2014, ICPE.

[3]  Ilya Safro,et al.  Multiscale network generation , 2012, 2015 18th International Conference on Information Fusion (Fusion).

[4]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[5]  Ramana Rao Kompella,et al.  Network Sampling via Edge-based Node Selection with Graph Induction , 2011 .

[6]  Ani Grubišić,et al.  Applying graph sampling methods on student model initialization in intelligent tutoring systems , 2016 .

[7]  Ilya Safro,et al.  Generating Scaled Replicas of Real-World Complex Networks , 2016, COMPLEX NETWORKS.

[8]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[9]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[10]  Ramana Rao Kompella,et al.  Network Sampling Designs for Relational Classification , 2012, ICWSM.

[11]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Laxmi N. Bhuyan,et al.  Scalable SIMD-Efficient Graph Processing on GPUs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[13]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[14]  Alexandru Iosup,et al.  Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[15]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[16]  Marko Bajec,et al.  Assessing the effectiveness of real-world network simplification , 2014, ArXiv.

[17]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[18]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[19]  J. W. Zhang,et al.  GSCALER: Synthetically Scaling A Given Graph , 2016, EDBT.

[20]  Blair D. Sullivan,et al.  Synthetic Graph Generation for Data-Intensive HPC Benchmarking: Background and Framework , 2013 .

[21]  Cees T. A. M. de Laat,et al.  Mix-and-Match: A Model-Driven Runtime Optimisation Strategy for BFS on GPUs , 2018, 2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3).

[22]  Milos Jovanovik,et al.  An RDF Dataset Generator for the Social Network Benchmark with Real-World Coherence , 2016, BLINK@ISWC.

[23]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[24]  Mohammad Al Hasan,et al.  Methods and Applications of Network Sampling , 2016 .

[25]  Pili Hu,et al.  A Survey and Taxonomy of Graph Sampling , 2013, ArXiv.