论文信息 - gSpan: graph-based substructure pattern mining

gSpan: graph-based substructure pattern mining

We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.

Jiawei Han | Xifeng Yan | Jiawei Han | Xifeng Yan

[1] Hiroki Arimura,et al. Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[2] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .

[3] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[4] Qiming Chen,et al. PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[5] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[6] George Karypis,et al. Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7] Mohammed J. Zaki. Efficiently mining frequent trees in a forest , 2002, KDD.