Giraph Dynamic Sized Structure Recurrent Subgraph Generation Algorithm for Frequent Subgraph Mining

Data Mining has a subpart called Frequent Subgraph Mining (FSM) and is a demanding area for the implementation of graph classification and graph clustering which is used in the area of the social network, chemical compounds, and biological datasets, enterprise world. Many research workers have been researching on how to produce an effective and optimized technique to extract the candidate subgraphs by eliminating duplicates for the last few decades. In the case of the Giraph distributed system, a different format for input and output classes is required to take graphs into memory and put graphs after completion of its operation, which leads to excessive memory exhaustion. In this paper, a novel methodology “Giraph Dynamic Sized Structure Frequent Subgraph Mining (GDSSFSM)” has been developed to reduce the memory necessity for FSM in a graph-distributed system. The proposed approach reorganizes the inner input format class (i.e. setEdgeInputFormatClass) without any changes. Hence, it can be used by default in a customized format. The experimental analysis is done on the different datasets with an existing algorithm based on execution time and memory requirements and concludes that it decreases up to on average 52% depending on the dataset and the graph (i.e., PageRank, Connected Components, and Simple Shortest Path) edge-centric algorithm. The proposed algorithm can be used in various fields of graph mining such as social networks, bioinformatics, and web data mining

[1]  Wei Chen,et al.  Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce , 2017, Future Gener. Comput. Syst..

[2]  Benjamin W. Priest,et al.  One Quadrillion Triangles Queried on One Million Processors , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Hilal Arslan,et al.  A hybrid single-source shortest path algorithm , 2019, Turkish J. Electr. Eng. Comput. Sci..

[4]  Martin Atzmüller,et al.  MinerLSD: Efficient Local Pattern Mining on Attributed Graphs , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[5]  Hui Wang,et al.  A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark , 2018 .

[6]  Panos Kalnis,et al.  Incremental Frequent Subgraph Mining on Large Evolving Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Yinghui Wu,et al.  Parallelizing Sequential Graph Computations , 2017, SIGMOD Conference.

[8]  V. D. Thi,et al.  An Optimization of Closed Frequent Subgraph Mining Algorithm , 2017 .

[9]  Mohammed J. Zaki,et al.  2016 Ieee International Conference on Big Data (big Data) Parallel Graph Mining with Dynamic Load Balancing , 2022 .

[10]  Yifan Chen,et al.  Frequent Subgraph Mining Based on Pregel , 2016, Comput. J..

[11]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[12]  Aarzoo Dhiman,et al.  Optimizing Frequent Subgraph Mining for Single Large Graph , 2016 .

[13]  Laxmi N. Bhuyan,et al.  Scalable SIMD-Efficient Graph Processing on GPUs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[14]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[15]  H. Ebrahimpour-Komleh,et al.  Computing connected components of graphs , 2014 .

[16]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[17]  Xiaokui Xiao,et al.  Large-scale frequent subgraph mining in MapReduce , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[19]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[20]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[21]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[22]  George Karypis,et al.  A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[23]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[24]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.