Pattern Morphing for Efficient Graph Mining

Graph mining applications analyze the structural properties of large graphs, and they do so by finding subgraph isomorphisms, which makes them computationally intensive. Existing graph mining techniques including both custom graph mining applications and general-purpose graph mining systems, develop efficient execution plans to speed up the exploration of the given query patterns that represent subgraph structures of interest. In this paper, we step beyond the traditional philosophy of optimizing the execution plans for a given set of patterns, and exploit the sub-structural similarities across different query patterns. We propose Pattern Morphing, a technique that enables structure-aware algebra over patterns to accurately infer the results for a given set of patterns using the results of a completely different set of patterns that are less expensive to compute. Pattern morphing "morphs" (or converts) a given set of query patterns into alternative patterns, while retaining full equivalency. It is a general technique that supports various operations over matches of a pattern beyond just counting (e.g., support calculation, enumeration, etc.), making it widely applicable to various graph mining applications like Motif Counting and Frequent Subgraph Mining. Since pattern morphing mainly transforms query patterns before their exploration starts, it can be easily incorporated in existing general-purpose graph mining systems. We evaluate the effectiveness of pattern morphing by incorporating it in Peregrine, a recent state-of-the-art graph mining system, and show that pattern morphing significantly improves the performance of different graph mining applications.

[1]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[2]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[3]  Gianmarco De Francisci Morales,et al.  QFrag: distributed graph search via subgraph isomorphism , 2017, SoCC.

[4]  Ruocheng Guo,et al.  Using network motifs to characterize temporal network evolution leading to diffusion inhibition , 2019, Social Network Analysis and Mining.

[5]  Srinivasan Parthasarathy,et al.  Fractal: A General-Purpose Graph Pattern Mining System , 2019, SIGMOD Conference.

[6]  Sungpack Hong,et al.  TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data , 2018, SIGMOD Conference.

[7]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[8]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[9]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[10]  James Cheng,et al.  G-Miner: an efficient task-oriented graph mining system , 2018, EuroSys.

[11]  Keval Vora,et al.  Peregrine: a pattern-aware graph mining system , 2020, EuroSys.

[12]  Sungpack Hong,et al.  PGX.D: a fast distributed graph processing engine , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Matei Ripeanu,et al.  PruneJuice: Pruning Trillion-edge Graphs to a Precise Pattern-Matching Solution , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Rajiv Gupta,et al.  Efficient Processing of Large Graphs via Input Reduction , 2016, HPDC.

[15]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[16]  Lijun Chang,et al.  Efficient Subgraph Matching by Postponing Cartesian Products , 2016, SIGMOD Conference.

[17]  Wook-Shin Han,et al.  Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together , 2019, SIGMOD Conference.

[18]  Panos Kalnis,et al.  Incremental Frequent Subgraph Mining on Large Evolving Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[19]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[20]  Bo Wu,et al.  ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[21]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[22]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[23]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[24]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Rajiv Gupta,et al.  CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing , 2017, ASPLOS.

[26]  Boris Cule,et al.  Grasping frequent subgraph mining for bioinformatics applications , 2018, BioData Mining.

[27]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[28]  Qinghua Zheng,et al.  Frequent Subgraph Based Familial Classification of Android Malware , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[29]  Jon M. Kleinberg,et al.  Detecting Strong Ties Using Network Motifs , 2017, WWW.

[30]  Mohammad Al Hasan,et al.  Graft: An Efficient Graphlet Counting Method for Large Graph Analysis , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Yuval Shavitt,et al.  RAGE - A rapid graphlet enumerator for large networks , 2012, Comput. Networks.

[32]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[33]  H. Howie Huang,et al.  CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching , 2019, SIGMOD Conference.

[34]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[35]  Sourav S. Bhowmick,et al.  DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine , 2016, SIGMOD Conference.

[36]  Yilong Yin,et al.  A Maximal Clique Based Multiobjective Evolutionary Algorithm for Overlapping Community Detection , 2017, IEEE Transactions on Evolutionary Computation.

[37]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[38]  Keval Vora,et al.  GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs , 2019, EuroSys.

[39]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[40]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[41]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[42]  Wei-Ta Chu,et al.  Visual pattern discovery for architecture image classification and product image search , 2012, ICMR.

[43]  Xuhao Chen,et al.  Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU , 2020, Proc. VLDB Endow..

[44]  Bo Wu,et al.  AutoMine: harmonizing high-level abstraction and high performance for graph mining , 2019, SOSP.

[45]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[46]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[47]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[48]  Xia Li,et al.  Identifying functions and prognostic biomarkers of network motifs marked by diverse chromatin states in human cell lines , 2019, Oncogene.

[49]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[50]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[51]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[52]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.