GraphZero: Breaking Symmetry for Efficient Graph Mining

Graph mining for structural patterns is a fundamental task in many applications. Compilation-based graph mining systems, represented by AutoMine, generate specialized algorithms for the provided patterns and substantially outperform other systems. However, the generated code causes substantial computation redundancy and the compilation process incurs too much overhead to be used online, both due to the inherent symmetry in the structural patterns. In this paper, we propose an optimizing compiler, GraphZero, to completely address these limitations through symmetry breaking based on group theory. GraphZero implements three novel techniques. First, its schedule explorer efficiently prunes the schedule space without missing any high-performance schedule. Second, it automatically generates and enforces a set of restrictions to eliminate computation redundancy. Third, it generalizes orientation, a surprisingly effective optimization that was mainly used for clique patterns, to apply to arbitrary patterns. Evaluated on multiple graph mining applications and complex patterns with 7 real-world graph datasets, GraphZero demonstrates up to 40X performance improvement and up to 197X reduction on schedule generation overhead over AutoMine.

[1]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[3]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[4]  Lawrence B. Holder,et al.  A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs , 2015, EDBT.

[5]  Jeremy Chen,et al.  Graphflow: An Active Graph Database , 2017, SIGMOD Conference.

[6]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Keshav Pingali,et al.  Parallel triangle counting and k-truss identification using graph-centric methods , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[8]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[9]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[10]  Keshav Pingali,et al.  A compiler for throughput optimization of graph algorithms on GPUs , 2016, OOPSLA.

[11]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[12]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[13]  H. Howie Huang,et al.  TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[15]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[16]  Xueqi Cheng,et al.  Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine , 2019, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[17]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[18]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[19]  Bo Wu,et al.  AutoMine: harmonizing high-level abstraction and high performance for graph mining , 2019, SOSP.

[20]  Jiawei Han,et al.  Node, Motif and Subgraph: Leveraging Network Functional Blocks Through Structural Convolution , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[21]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[22]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[25]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[26]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[27]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[28]  Keshav Pingali,et al.  A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms , 2018, Proc. VLDB Endow..

[29]  KarypisGeorge,et al.  Multilevelk-way Partitioning Scheme for Irregular Graphs , 1998 .

[30]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[31]  Bo Wu,et al.  ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[32]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[33]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[34]  Keshav Pingali,et al.  Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms , 2018, Euro-Par.

[35]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[36]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[37]  Kang Chen,et al.  Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System , 2018, ASPLOS.

[38]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[39]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[40]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[41]  H. Howie Huang,et al.  T ri C ore : parallel triangle counting on GPUs , 2018 .

[42]  Jianzhong Li,et al.  Fast Rectangle Counting on Massive Networks , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[43]  James Cheng,et al.  G-Miner: an efficient task-oriented graph mining system , 2018, EuroSys.

[44]  Srinivasan Parthasarathy,et al.  Fractal: A General-Purpose Graph Pattern Mining System , 2019, SIGMOD Conference.

[45]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[46]  Haibo Chen,et al.  Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration , 2016, OSDI.

[47]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[48]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[49]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[50]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[51]  Rajiv Gupta,et al.  Efficient Processing of Large Graphs via Input Reduction , 2016, HPDC.