Fast and scalable algorithms for mining subgraphs in a single large graph

Abstract Mining frequent subgraphs is an important issue in graph mining. It is defined as finding all subgraphs whose occurrences in the dataset are greater than or equal to a given frequency threshold. In recent applications, such as social networks, the underlying graphs are very large. Algorithms for mining frequent subgraphs from a single large graph have been developing rapidly lately. Among all such algorithms, GraMi is considered the state-of-the-art. However, GraMi still consumes a lot of time and memory in the mining of a large graph. In this paper, we propose two effective strategies to optimize the GraMi algorithm, which help to increase performance as well as reduce memory consumption during execution. Firstly, GraMi only lists all frequent subgraphs, without computing the support of each mined subgraph. This is disadvantageous in decision support systems, which require information about the support of all subgraphs. Therefore, we optimize GraMi to compute the support values during the mining process. Secondly, we apply the strategy of sorting all edges in graphs by their frequencies, which means that edges with low frequencies will be mined first, and vice versa. This sorting strategy can reduce the number of possibly infrequent subgraph candidates, especially on large subgraphs that are usually derived from those edges with high frequency. Thirdly, we apply a parallel processing technique, in which each frequent edge is executed simultaneously in a separate thread, and improve our parallel strategy by combination with the sorting strategy. Our experiments were performed on three real datasets and the results showed that the performance, as well as memory requirements, are better than those of the original GraMi algorithm

[1]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[2]  Saeed Jalili,et al.  Distributed discovery of frequent subgraphs of a network using MapReduce , 2015, Computing.

[3]  José Eladio Medina-Pagola,et al.  Frequent approximate subgraphs as features for graph-based image classification , 2012, Knowl. Based Syst..

[4]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[5]  Boris Cule,et al.  Grasping frequent subgraph mining for bioinformatics applications , 2018, BioData Mining.

[6]  George Karypis,et al.  A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Aarzoo Dhiman,et al.  Optimizing Frequent Subgraph Mining for Single Large Graph , 2016 .

[8]  Hamida Seba,et al.  Subgraph Isomorphism Search in Massive Graph Databases , 2016, IoTBD.

[9]  Mohammad Al Hasan,et al.  An Iterative MapReduce Based Frequent Subgraph Mining Algorithm , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Stefan Kramer,et al.  Online Structural Graph Clustering Using Frequent Subgraph Mining , 2010, ECML/PKDD.

[11]  Jintao Zhang,et al.  An efficient graph-mining method for complicated and noisy data with real-world applications , 2011, Knowledge and Information Systems.

[12]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mohammed J. Zaki,et al.  2016 Ieee International Conference on Big Data (big Data) Parallel Graph Mining with Dynamic Load Balancing , 2022 .

[14]  Yifan Chen,et al.  Frequent Subgraph Mining Based on Pregel , 2016, Comput. J..

[15]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[16]  José Francisco Martínez Trinidad,et al.  Graph Clustering via Inexact Patterns , 2014, CIARP.

[17]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[18]  Vijay Ingalalli,et al.  Querying and Mining Multigraphs , 2017 .

[19]  Hamido Fujita,et al.  Mining weighted subgraphs in a single large graph , 2020, Inf. Sci..

[20]  Hui Wang,et al.  A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark , 2018 .

[21]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[22]  Adnan Yazici,et al.  BB-Graph: A New Subgraph Isomorphism Algorithm for Efficiently Querying Big Graph Databases , 2017, ArXiv.

[23]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).