Parallel processing of filtered queries in attributed semantic graphs

Execution of complex analytic queries on massive semantic graphs is a challenging problem in big-data analytics that requires high-performance parallel computing. In a semantic graph, vertices and edges carry attributes of various types and the analytic queries typically depend on the values of these attributes. Thus, the computation must view the graph through a filter that passes only those individual vertices and edges of interest. Previous investigations have developed Knowledge Discovery Toolbox (KDT), a sophisticated Python library for parallel graph computations. In KDT, the user can write custom graph algorithms by specifying operations between edges and vertices (semiring operations). The user can also customize existing graph algorithms by writing filters. Although the high-level language for this customization enables domain scientists to productively express their graph analytics requirements, the customized queries perform poorly due to the overhead of having to call into the Python virtual machine for each vertex and edge.In this work, we use the Selective Embedded Just-In-Time Specialization (SEJITS) approach to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python. We evaluate our approach by comparing it with the high-performance Combinatorial BLAS engine and show that our approach combines the benefits of programming in a high-level language with executing in a low-level parallel environment. We increase the system's flexibility by developing techniques that provide users with the ability to define new vertex and edge types from Python. We also present a new Roofline model for graph traversals and show that we achieve performance that is significantly closer to the bounds suggested by the Roofline. Finally, to further understand the complex interaction with the underlying architecture, we present an analysis using performance counters that quantifies the improvement in hardware behavior in the context our SEJITS methodology. Overall, we demonstrate the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs with hundreds of millions of edges and scaling to thousands of processors for graphs. Domain-specific language for flexible filtering and customization of graph algorithms.Roofline performance model for high-performance graph exploration.Experimental demonstration of excellent performance and scaling.Demonstration of the generality by specializing two different graph algorithms.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[3]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[4]  Samuel Williams,et al.  High-Productivity and High-Performance Analysis of Filtered Semantic Graphs , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[5]  Timothy A. Davis,et al.  Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2) , 2006 .

[6]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Yogesh L. Simmhan,et al.  Optimizations and Analysis of BSP Graph Processing Models on Public Clouds , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[9]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[10]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[14]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[15]  David A. Patterson,et al.  Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[17]  Daan Leijen,et al.  Domain specific embedded compilers , 1999, DSL '99.

[18]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[19]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[21]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[22]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[23]  Andreas Meyer-Lindenberg,et al.  From maps to mechanisms through neuroimaging of schizophrenia , 2010, Nature.

[24]  John R. Gilbert,et al.  Scalable complex graph analysis with the knowledge discovery toolbox , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[26]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[27]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[28]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[29]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[30]  C. Porcaro,et al.  Multimodal Functional Network Connectivity: An EEG-fMRI Fusion in Network Space , 2011, PloS one.

[31]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[32]  Shoaib Kamil,et al.  Portable parallel performance from sequential, productive, embedded domain-specific languages , 2012, PPoPP '12.

[33]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[34]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[35]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .