A-RAFF: A Ranked Frequent Pattern-growth Subgraph Pattern Discovery Approach

Graph mining is one of the arms of Data Mining in which voluminous complex data are represented in the form of graphs and mining is done to infer useful knowledge from them. Frequent subgraph mining (FSM) is an active research field and is considered as the essence of graph mining. FSM is defined as finding all the subgraph patterns that occur frequently over the entire set of graphs. FSM is extensively used in graph clustering, classification and building indices in the databases. In literature, different FSM algorithms have been proposed such as AGM, FSG, SPIN, SUBDUE, gSpan, FFSM, CloseGraph, FSG, GREW. Most of these FSM techniques perform very well for small to medium size graph datasets, but the computational cost of FSM becomes very critical when the graph size is increased. In accession to this, the number of frequent subgraphs patterns grows exponentially with the increasing size of graph datasets. Consequently, in this research work, a novel FSM approach A RAnked Frequent pattern-growth Framework (A-RAFF) is proposed. This work is a preliminary work to study on how to make A-RAFF both computational effective and avoid the generation of the huge number of useless frequent subgraph patterns. A-RAFF has achieved efficiency by embedding the ranking of discovering FSGs during the mining process. The experiments on the three different real benchmark graph datasets demonstrated that the mining results of A-RAFF are very promising as compared to the existing FSM techniques.

[1]  Ruth Nussinov,et al.  Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review , 2012, Pharmacology & therapeutics.

[2]  Rajiv Ranjan,et al.  Survey on social networking services , 2013 .

[3]  Yang Yu,et al.  FSP: Frequent Substructure Pattern mining , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[4]  Cherif Chiraz Latiri,et al.  LC-mine: a framework for frequent subgraph mining with local consistency techniques , 2014, Knowledge and Information Systems.

[5]  Aarzoo Dhiman,et al.  Optimizing Frequent Subgraph Mining for Single Large Graph , 2016 .

[6]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[8]  Peter Sanders,et al.  Engineering Multilevel Graph Partitioning Algorithms , 2010, ESA.

[9]  G. Athithan,et al.  A comparative survey of algorithms for frequent subgraph discovery , 2011 .

[10]  Abraham Kandel,et al.  A Graph-Based Framework for Web Document Mining , 2004, Document Analysis Systems.

[11]  Kyoji Kawagoe,et al.  Big Social Network Mining for "Following" Patterns , 2015, C3S2E.

[12]  Klemens Böhm,et al.  Mining Edge-Weighted Call Graphs to Localise Software Bugs , 2008, ECML/PKDD.

[13]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[14]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[15]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[17]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[18]  Simon Fong,et al.  Performance Evaluation of Frequent Subgraph Discovery Techniques , 2014 .

[19]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[20]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[21]  Izzat Alsmadi,et al.  Clustering and classification of email contents , 2015, J. King Saud Univ. Comput. Inf. Sci..

[22]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[23]  R. Manjula,et al.  Data Mining: Building Social Network , 2015 .

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  Saeed Jalili,et al.  High-performance parallel frequent subgraph discovery , 2015, The Journal of Supercomputing.

[26]  Charu C. Aggarwal,et al.  A Survey of Signed Network Mining in Social Media , 2015, ACM Comput. Surv..

[27]  Mohammad Al Hasan,et al.  FS3: A sampling based method for top-k frequent subgraph mining , 2014, BigData.

[28]  Charalampos E. Tsourakakis,et al.  Dense Subgraph Discovery: KDD 2015 tutorial , 2015, KDD.

[29]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[30]  Tanya Y. Berger-Wolf,et al.  Structure Prediction in Temporal Networks using Frequent Subgraphs , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[31]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[32]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[33]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[34]  Binayak Panda,et al.  A Comparative Study on Serial and Parallel Web Content Mining , 2016 .

[35]  Prashant Bhat,et al.  Web Video Object Mining: Expectation Maximization and Density Based Clustering of Web Video Metadata Objects , 2016 .

[36]  Ido Guy,et al.  Social Recommender Systems , 2015, Recommender Systems Handbook.

[37]  Santosh Kumar,et al.  A Machine Learning Based Web Spam Filtering Approach , 2016, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA).

[38]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[39]  Sharma Chakravarthy,et al.  InfoSift: Adapting Graph Mining Techniques for Text Classification , 2005, FLAIRS.

[40]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[41]  R. Prabhakar,et al.  Frequent Subgraph Mining Algorithms – A Survey , 2015 .

[43]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[44]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[45]  Jiawei Han,et al.  Survey on web spam detection: principles and algorithms , 2012, SKDD.

[46]  Marc Plantevit,et al.  Mining Graph Topological Patterns: Finding Covariations among Vertex Descriptors , 2013, IEEE Transactions on Knowledge and Data Engineering.

[47]  Thomas Sauerwald,et al.  A new diffusion-based multilevel algorithm for computing graph partitions of very high quality , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[48]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[49]  Wajdi Dhifli,et al.  PGR: A Graph Repository of Protein 3D-Structures , 2016, ArXiv.

[50]  Engelbert Mephu Nguifo,et al.  Motif Discovery in Protein 3D‐Structures using Graph Mining Techniques , 2015 .

[51]  Philip S. Yu,et al.  Searching Substructures with Superimposed Distance , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[52]  K. Lakshmi,et al.  A COMPARATIVE STUDY OF FREQUENT SUBGRAPH MINING ALGORITHMS , 2012 .

[53]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[54]  Charles-Edmond Bichot,et al.  Graph Partitioning: Bichot/Graph Partitioning , 2013 .

[55]  Ralf Diekmann,et al.  PARTY - A Software Library for Graph Partitioning , 1997 .

[56]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[57]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[58]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[59]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[60]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[61]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[62]  Ricardo Baeza-Yates,et al.  Web Structure Mining , 2010 .

[63]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[64]  Sirisha Velampalli,et al.  Frequent SubGraph Mining Algorithms: Framework, Classification, Analysis, Comparisons , 2018 .

[65]  Mary Ellen Bock,et al.  I STITUTO DI A NALISI DEI S ISTEMI ED I NFORMATICA “ Antonio Ruberti , 2012 .

[66]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[67]  Hideyuki Imai,et al.  Dense core model for cohesive subgraph discovery , 2016, Soc. Networks.

[68]  Ichigaku Takigawa,et al.  Graph mining: procedure, application to drug discovery and recent advances. , 2013, Drug discovery today.

[69]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[70]  De-Nian Yang,et al.  On recommendation of graph mining algorithms for different data , 2016, 2016 International Conference on Big Data and Smart Computing (BigComp).

[71]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[72]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[73]  Appala Srinuvasu Muttipati,et al.  Analysis of Large Graph Partitioning and Frequent Subgraph Mining on Graph Data , 2015 .

[74]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[75]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .