High-Performance Graph Data Management and Mining in Cloud Environments with X10

Large-scale graph data management and mining in cloud environments have been a widely discussed issue in recent times. The goal and the scope of this chapter is to discuss how X10 (a Partitioned Global Address Space (PGAS) language) has been applied for programming data-intensive systems. Specifically, we focus on the problem of creating scalable systems for storing and processing large-scale graph data on HPC clouds with X10. The chapter first discusses about large-scale graph processing with X10. Next, it describes the experience of designing and implementing a distributed graph database engine called Acacia with X10. We specifically focus on Acacia’s RDF extension. Finally, it will describe how a graph database benchmarking framework called XGDBench has been developed to analyze the performance of graph database servers. Overall the chapter describes our experiences of implementing such graph-based systems and frameworks with X10.

[1]  Toyotaro Suzumura,et al.  Towards highly scalable X10 based spectral clustering , 2012, 2012 19th International Conference on High Performance Computing.

[2]  Sameh Elnikety,et al.  Horton: Online Query Execution Engine for Large Distributed Graphs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[3]  Karl Huppler,et al.  The Art of Building a Good Benchmark , 2009, TPCTC.

[4]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[5]  David Cunningham,et al.  Resilient X10: efficient failure-aware programming , 2014, PPoPP '14.

[6]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[7]  Jeremy G. Siek,et al.  The generic graph component library , 1999, OOPSLA '99.

[8]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[9]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[10]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[11]  Taha Osman,et al.  A Pragmatic Approach to Semantic Repositories Benchmarking , 2010, ESWC.

[12]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[13]  Sherif Sakr,et al.  DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication , 2015, Proc. VLDB Endow..

[14]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[15]  Toyotaro Suzumura,et al.  XGDBench: A benchmarking platform for graph stores in exascale clouds , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[16]  Toyotaro Suzumura,et al.  Introducing ScaleGraph: an X10 library for billion scale graph analytics , 2012, X10 '12.

[17]  Guojing Cong,et al.  Fast PGAS connected components algorithms , 2009, PGAS '09.

[18]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[19]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[20]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[21]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[22]  David A. Bader,et al.  Multithreaded Algorithms for Processing Massive Graphs. , 2007 .

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[25]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[26]  Ching-Yung Lin,et al.  Graph analytics and storage , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[27]  Jim Law,et al.  Review of "The boost graph library: user guide and reference manual by Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine." Addison-Wesley 2002. , 2003, SOEN.

[28]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[29]  J. Anthonisse The rush in a directed graph , 1971 .

[30]  David Cunningham,et al.  A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.

[31]  Brian W. Barrett,et al.  Implementing a portable Multi-threaded Graph Library: The MTGL on Qthreads , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[32]  Alan G. Labouseur,et al.  The G* graph database: efficiently managing large distributed dynamic graphs , 2015, Distributed and Parallel Databases.

[33]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[34]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[35]  Toyotaro Suzumura,et al.  Scalable performance of ScaleGraph for large scale graph analysis , 2012, 2012 19th International Conference on High Performance Computing.

[36]  Pangfeng Liu,et al.  Distributed Graph Database for Large-Scale Social Computing , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[37]  Jaakko Järvi,et al.  A comparative study of language support for generic programming , 2003, OOPSLA 2003.

[38]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[39]  Li Ma,et al.  Towards a Complete OWL Ontology Benchmark , 2006, ESWC.

[40]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[41]  Kurt Rohloff,et al.  An Evaluation of Triple-Store Technologies for Large Data Stores , 2007, OTM Workshops.

[42]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[43]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[44]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[45]  Toyotaro Suzumura,et al.  Introducing Acacia-RDF: An X10-Based Scalable Distributed RDF Graph Database Engine , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[46]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[47]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[48]  Hai Jin,et al.  SemStore: A Semantic-Preserving Distributed RDF Triple Store , 2014, CIKM.

[49]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[50]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[51]  Toyotaro Suzumura,et al.  Towards Scalable Distributed Graph Database Engine for Hybrid Clouds , 2014, 2014 5th International Workshop on Data-Intensive Computing in the Clouds.

[52]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[53]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[54]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[55]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[56]  Vivek Sarkar,et al.  May-happen-in-parallel analysis of X10 programs , 2007, PPoPP.

[57]  Zhenzhen Zhao,et al.  The design of activity-oriented social networking: Dig-Event , 2011, iiWAS '11.

[58]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[59]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[60]  Ladislav Hluchý,et al.  Benchmarking Traversal Operations over Graph Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[61]  Toyotaro Suzumura,et al.  Graph database benchmarking on cloud environments with XGDBench , 2013, Automated Software Engineering.

[62]  Dimitrios Tsoumakos,et al.  Graph-Aware, Workload-Adaptive SPARQL Query Caching , 2015, SIGMOD Conference.

[63]  Guojing Cong,et al.  Fast PGAS Implementation of Distributed Graph Algorithms , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[64]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[65]  Dmitry Batenkov Boosting productivity with the Boost Graph Library , 2011, XRDS.

[66]  David Wood,et al.  Linked Data , 2014 .

[67]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[68]  Matthew Arnold,et al.  META: Middleware for Events, Transactions, and Analytics , 2016, IBM J. Res. Dev..

[69]  Toyotaro Suzumura,et al.  Towards Emulation of Large Scale Complex Network Workloads on Graph Databases with XGDBench , 2014, 2014 IEEE International Congress on Big Data.