Mining Closed Frequent Free Trees in Graph Databases

Free tree, as a special graph which is connected, undirected and acyclic, has been extensively used in bioinformatics, pattern recognition, computer networks, XML databases, etc. Recent research on structural pattern mining has focused on an important problem of discovering frequent free trees in large graph databases. However, it can be prohibitive due to the presence of an exponential number of frequent free trees in the graph database. In this paper, we propose a computationally efficient algorithm that discovers only closed frequent free trees in a database of labeled graphs. A free tree t is closed if there exist no supertrees of t that has the same frequency of t. Two pruning algorithms, the safe position pruning and the safe label pruning, are proposed to efficiently detect unsatisfactory search spaces with no closed frequent free trees generated. Based on the special characteristics of free tree, the automorphism-based pruning and the canonical mapping-based pruning are introduced to facilitate the mining process. Our performance study shows that our algorithm not only reduces the number of false positives generated but also improves the mining efficiency, especially in the presence of large frequent free tree patterns in the graph database.

[1]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[3]  Jeffrey Xu Yu,et al.  Fast Frequent Free Tree Mining in Graph Databases , 2006, ICDM Workshops.

[4]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[5]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[6]  Stefan Kramer,et al.  Frequent free tree discovery in graph data , 2004, SAC '04.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[11]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[12]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.