An Efficient Graph Mining System for Large Patterns

There is a growing interest in designing systems for graph pattern mining in recent years. The existing systems mostly focus on small patterns and have difficulty in mining larger patterns. In this work, we propose Angelica, a single-machine graph pattern mining system aiming at supporting large patterns. We first propose a new computation model called multi-vertex exploration. The model allows us to divide a large pattern mining task into smaller matching tasks. Different from the existing systems which perform vertex-by-vertex exploration, we explore larger subgraphs by joining small subgraphs. Based on the new computation model, we further enhance the performance through an index-based quick pattern technique that addresses the issue of expensive isomorphism check, and approximate join that mitigates the issue of subgraph explosion of large patterns. The experimental results show that Angelica achieves significant speedups against the state-of-the-art graph pattern mining systems and supports large pattern mining that none of the existing systems can handle.

[1]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[2]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[3]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[4]  A. Winsor Sampling techniques. , 2000, Nursing times.

[5]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[6]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  László Babai,et al.  Computational complexity and the classification of finite simple groups , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[8]  Amine Mhedhbi,et al.  Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[9]  A Vázquez,et al.  The topological relationship between the large-scale attributes and local interaction patterns of complex networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[11]  James Cheng,et al.  G-Miner: an efficient task-oriented graph mining system , 2018, EuroSys.

[12]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Bo Wu,et al.  ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[14]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[15]  Wei-Ta Chu,et al.  Visual pattern discovery for architecture image classification and product image search , 2012, ICMR.

[16]  Alessia Saggese,et al.  Challenging the Time Complexity of Exact Subgraph Isomorphism for Huge and Dense Graphs with VF3 , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Bo Wu,et al.  AutoMine: harmonizing high-level abstraction and high performance for graph mining , 2019, SOSP.

[18]  Sourav S. Bhowmick,et al.  DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine , 2016, SIGMOD Conference.

[19]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[20]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[21]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[22]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[23]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[24]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[25]  Petteri Kaski,et al.  Engineering an Efficient Canonical Labeling Tool for Large and Sparse Graphs , 2007, ALENEX.

[26]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[27]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[28]  Lijun Chang,et al.  Efficient Subgraph Matching by Postponing Cartesian Products , 2016, SIGMOD Conference.

[29]  K. Pingali,et al.  Pangolin , 2019, Proc. VLDB Endow..

[30]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[31]  Daniel J. Abadi,et al.  Query optimization of distributed pattern matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[32]  Yi-Cheng Tu,et al.  Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs , 2017, SIGMOD Conference.

[33]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[34]  Srinivasan Parthasarathy,et al.  Fractal: A General-Purpose Graph Pattern Mining System , 2019, SIGMOD Conference.

[35]  Keval Vora,et al.  Peregrine: a pattern-aware graph mining system , 2020, EuroSys.

[36]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[37]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[38]  Wook-Shin Han,et al.  Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together , 2019, SIGMOD Conference.

[39]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[40]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[41]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.