Graph analytics and storage

Many Big Data analytics essentially explore the relationship among interconnected entities, which are naturally represented as graphs. However, due to the irregular data access patterns in the graph computations, it remains a fundamental challenge to deliver highly efficient solutions for large scale graph analytics. Such inefficiency restricts the utilization of many graph algorithms in Big Data scenarios. To address the performance issues in large scale graph analytics, we develop a graph processing system called System G, which explores efficient graph data organization for parallel computing architectures. We discuss various graph data organizations and their impact on data locality during graph traversals, which results in various cache performance behavior on processor side. In addition, we analyze data parallelism from architecture's perspective and experimentally show the efficiency for System G based graph analytics. We present experimental results for commodity multicore clusters and IBM PERCS supercomputers to illustrate the performance of System G for large scale graph analytics.

[1]  Ling Liu,et al.  Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning , 2013, Proc. VLDB Endow..

[2]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[3]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4]  William J. Dally,et al.  Multi-Core for HPC: breakthrough or breakdown? , 2006, SC.

[5]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[6]  David A. Bader,et al.  Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors , 2007, ISPA.

[7]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[8]  Jimeng Sun,et al.  GBASE: a scalable and general graph management system , 2011, KDD.

[9]  O Seongil,et al.  McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[10]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[11]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[12]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[13]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[14]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.