Chopper: Efficient algorithm for tree mining

With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, namedChopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMiner V, one of the fastest methods proposed previously, in mining large databases. At the end of this paper, the potential improvement ofChopper is mentioned.

[1]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[2]  Ke Wang,et al.  Schema Discovery for Semistructured Data , 1997, KDD.

[3]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[4]  Tadashi Horiuchi,et al.  Graph-Based Induction for General Graph Structured Data , 1999, IFIP Working Conference on Database Semantics.

[5]  Kaizhong Zhang,et al.  Automated Discovery of Active Motifs in Multiple RNA Secondary Structures , 1996, KDD.

[6]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[7]  Richard Cole,et al.  Tree pattern matching and subset matching in deterministic O(n log3 n)-time , 1999, SODA '99.

[8]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Takayoshi Shoudai,et al.  Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents , 2001, PAKDD.

[11]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[12]  Heikki Mannila,et al.  Global partial orders from sequential data , 2000, KDD '00.

[13]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[14]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[15]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.