Efficient Mining of Closed Tree Patterns from Large Tree Databases with Subtree Constraint

Mining frequent tree patterns from tree databases has practical importance in domains like Web mining, Bioinformatics, and so on. Although there have been algorithms on efficient tree mining, these algorithms are often lack of the interpretability in that they often produce a huge number of patterns, most of which are meaningless to users. This paper aims at both demands, one with respect to computational cost, which is efficient generation of tree patterns, and another one with respect to the interpretability. This task requires an efficient method to incorporate the users' needs into mining process. We propose a new top-down method for mining unordered closed tree patterns from a database of trees such that every mined pattern must contain a common piece of information in the form of a tree specified by the user. This type of mining is called mining with subtree constraint which would be useful, for example, inWeb mining and Bioinformatics, where users want to extract common patterns around some given information from original data. The proposed algorithm is tested and compared with a state-of-the-art tree mining algorithm on real and artificial datasets with very good results.

[1]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[2]  Pavel Zezula,et al.  Tree Signatures for XML Querying and Navigation , 2003, Xsym.

[3]  Kaizhong Zhang,et al.  Pattern matching in unordered trees , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[4]  Bruno Crémilleux,et al.  Efficient Mining Under Rich Constraints Derived from Various Datasets , 2006, KDID.

[5]  Jeffrey Xu Yu,et al.  Mining Closed Frequent Free Trees in Graph Databases , 2007, DASFAA.

[6]  Peng Gao,et al.  A New Marketing Channel Management Strategy Based on Frequent Subtree Mining , 2007 .

[7]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[8]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[9]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[10]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[11]  S. Rao Kosaraju,et al.  Efficient Tree Pattern Matching (Preliminary Version) , 1989, FOCS 1989.

[12]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[13]  Lei Zou,et al.  Mining Frequent Induced Subtree Patterns with Subtree-Constraint , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[14]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[15]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[16]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[17]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[18]  José L. Balcázar,et al.  Mining frequent closed rooted trees , 2009, Machine Learning.

[19]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[20]  Zvi Galil,et al.  Faster tree pattern matching , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[21]  Richard Cole,et al.  Tree pattern matching and subset matching in deterministic O(n log3 n)-time , 1999, SODA '99.

[22]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  Lei Zou,et al.  PrefixTreeESpan: A Pattern Growth Algorithm for Mining Embedded Subtrees , 2006, WISE.

[24]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[25]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[26]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[27]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[28]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[29]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[30]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[31]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[32]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[33]  Alexandre Termier,et al.  DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34]  Mario Gerla,et al.  Aggregated Multicast – A Comparative Study , 2002, Cluster Computing.

[35]  Torsten Schlieder,et al.  Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[36]  Atsuyoshi Nakamura,et al.  Mining Frequent Trees with Node-Inclusion Constraints , 2005, PAKDD.

[37]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[38]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[39]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[40]  Carolyn R. Bertozzi,et al.  Essentials of Glycobiology , 1999 .

[41]  Minoru Kanehisa,et al.  Mining significant tree patterns in carbohydrate sugar chains , 2008, ECCB.

[42]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.