Nearly Exact Mining of Frequent Trees in Large Networks

Mining frequent patterns in a single network (graph) poses a number of challenges. Already only to match one path pattern to a network (upto subgraph isomorphism) is NP-complete. Matching algorithms that exist, become intractable even for reasonably small patterns, on networks which are large or have a high average degree. Based on recent advances in parameterized complexity theory, we propose a novel miner for rooted trees in networks. The miner, for a fixed parameter k (maximal pattern size), can mine all rooted trees with delay linear in the size of the network and only mildly exponential in the fixed parameter k (2k). This allows us to mine tractably, rooted trees, in large networks such as the WWW or social networks. We establish the practical applicability of our miner, by presenting an experimental evaluation on both synthetic and real-world data.

[1]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[2]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[4]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[5]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[6]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[7]  Jan Ramon,et al.  Frequent subgraph mining in outerplanar graphs , 2006, KDD '06.

[8]  Weijia Jia,et al.  Vertex Cover: Further Observations and Further Improvements , 2001, J. Algorithms.

[9]  John Michael Robson,et al.  Algorithms for Maximum Independent Sets , 1986, J. Algorithms.

[10]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[11]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[13]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[14]  David W. Aha,et al.  Transforming Graph Data for Statistical Relational Learning , 2012, J. Artif. Intell. Res..

[15]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Aristides Gionis,et al.  Mining Graph Evolution Rules , 2009, ECML/PKDD.

[18]  Ioannis Koutis,et al.  Faster Algebraic Algorithms for Path and Packing Problems , 2008, ICALP.

[19]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[20]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[21]  Jan Ramon,et al.  Efficient frequent connected subgraph mining in graphs of bounded tree-width , 2010, LWA.

[22]  Mostafa Haghir Chehreghani,et al.  Efficiently Mining Unordered Trees , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[24]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[25]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[26]  Maurice Bruynooghe,et al.  Logical Bayesian Networks and Their Relation to Other Probabilistic Logical Models , 2005, BNAIC.

[27]  Anton Dries,et al.  Mining Patterns in Networks using Homomorphism , 2011, SDM.

[28]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[29]  Toon Calders,et al.  All normalized anti-monotonic overlap graph measures are bounded , 2011, Data Mining and Knowledge Discovery.

[30]  Hui Xiong,et al.  Mining globally distributed frequent subgraphs in a single labeled graph , 2009, Data Knowl. Eng..

[31]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[32]  Christian Borgelt,et al.  MoSS: a program for molecular substructure mining , 2005 .

[33]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jan Ramon,et al.  An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining , 2012, ECML/PKDD.

[35]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[36]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[37]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[38]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[39]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[40]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[41]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[42]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[43]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.