Efficient Mining of Frequent Subgraphs with Two-Vertex Exploration

Frequent Subgraph Mining (FSM) is the key task in many graph mining and machine learning applications. Numerous systems have been proposed for FSM in the past decade. Although these systems show good performance for small patterns (with no more than four vertices), we found that they have difficulty in mining larger patterns. In this work, we propose a novel two-vertex exploration strategy to accelerate the mining process. Compared with the single-vertex exploration adopted by previous systems, our two-vertex exploration avoids the large memory consumption issue and significantly reduces the memory access overhead. We further enhance the performance through an index-based quick pattern technique that reduces the overhead of isomorphism checks, and a subgraph sampling technique that mitigates the issue of subgraph explosion. The experimental results show that our system achieves significant speedups against the state-of-the-art graph pattern mining systems and supports larger pattern mining tasks that none of the existing systems can handle.

[1]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[2]  A Vázquez,et al.  The topological relationship between the large-scale attributes and local interaction patterns of complex networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[4]  Bo Wu,et al.  AutoMine: harmonizing high-level abstraction and high performance for graph mining , 2019, SOSP.

[5]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[6]  Xuehai Qian,et al.  DecoMine: A Compilation-Based Graph Pattern Mining System with Pattern Decomposition , 2022, ASPLOS.

[7]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Wei-Ta Chu,et al.  Visual pattern discovery for architecture image classification and product image search , 2012, ICMR.

[9]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[10]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Lawrence B. Holder,et al.  Frequent subgraph mining on a single large graph using sampling techniques , 2010, MLG '10.

[12]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[13]  James Cheng,et al.  G-Miner: an efficient task-oriented graph mining system , 2018, EuroSys.

[14]  Mohammad Al Hasan,et al.  FS3: A sampling based method for top-k frequent subgraph mining , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[15]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[16]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[17]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[18]  Petteri Kaski,et al.  Engineering an Efficient Canonical Labeling Tool for Large and Sparse Graphs , 2007, ALENEX.

[19]  Yi-Cheng Tu,et al.  Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs , 2017, SIGMOD Conference.

[20]  Bo Wu,et al.  ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[21]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[22]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[23]  László Babai,et al.  Computational complexity and the classification of finite simple groups , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[24]  K. Pingali,et al.  Pangolin , 2019, Proc. VLDB Endow..

[25]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[26]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[27]  Srinivasan Parthasarathy,et al.  Fractal: A General-Purpose Graph Pattern Mining System , 2019, SIGMOD Conference.

[28]  Keval Vora,et al.  Peregrine: a pattern-aware graph mining system , 2020, EuroSys.

[29]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.