Studies in Frequent Tree Mining

Employing Data mining techniques for structured data is particularly challenging, because it is commonly assumed that the structure of the data encodes part of its semantics. As a result are classical data mining techniques insufficient to analyze and mine these data. In this thesis we develop several mining algorithms for tree structured data and discuss some applications. Moreover, we focus on algorithms that only retrieve a small subset of all potentially interesting patterns, while the overall quality of the retrieved subset is as good as the complete set of patterns. The results show beside a smaller set of more focused patterns, that the proposed algorithms are far more efficient over existing algorithms.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[3]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[5]  Stéphane Bressan,et al.  Information Extraction - Tree Alignment Approach to Pattern Discovery in Web Documents , 2002, DEXA.

[6]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[7]  Jeroen De Knijf,et al.  Mining Tree Patterns with Almost Smallest Supertrees , 2008, SDM.

[8]  Alberto H. F. Laender,et al.  Automatic web news extraction using tree edit distance , 2004, WWW '04.

[9]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[10]  Jeroen De Knijf,et al.  FAT-miner: mining frequent attribute trees , 2007, SAC '07.

[11]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[12]  Charu C. Aggarwal,et al.  Xproj: a framework for projected structural clustering of xml documents , 2007, KDD '07.

[13]  Ke Wang,et al.  Mining association rules from stars , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[15]  Stefan Kramer,et al.  Generalized Version Space Trees , 2003, KDID.

[16]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[17]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[18]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  R. Bellman Dynamic programming. , 1957, Science.

[22]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[23]  G. Klebe,et al.  Multiple Graph Alignment for the Structural Analysis of Protein Active Sites , 2007, TCBB.

[24]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[25]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[26]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[27]  Ludovic Denoyer,et al.  Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents , 2007, SIGF.

[28]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[29]  Jean-François Boulicaut,et al.  Frequent Closures as a Concise Representation for Binary Data Mining , 2000, PAKDD.

[30]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[31]  Stefan Kramer,et al.  Frequent free tree discovery in graph data , 2004, SAC '04.

[32]  Ludovic Denoyer,et al.  The Wikipedia XML corpus , 2006, SIGF.

[33]  Christian Borgelt,et al.  A Decision Tree Plug-In for DataEngine , 2004 .

[34]  Arne Koopman,et al.  Reducing the Frequent Pattern Set , 2006, ICDM Workshops.

[35]  Heikki Mannila,et al.  Finding low-entropy sets and trees from binary data , 2007, KDD '07.

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[38]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[39]  Tatsuya Akutsu,et al.  Efficient tree-matching methods for accurate carbohydrate database queries. , 2003, Genome informatics. International Conference on Genome Informatics.

[40]  Luc De Raedt,et al.  Don't Be Afraid of Simpler Patterns , 2006, PKDD.

[41]  Luc De Raedt,et al.  Mining Association Rules in Multiple Relations , 1997, ILP.

[42]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[43]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[44]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[45]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[46]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[47]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[48]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[49]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[50]  Daniel Gildea,et al.  Loosely Tree-Based Alignment for Machine Translation , 2003, ACL.

[51]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[52]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[53]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[54]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[55]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[56]  Arno J. Knobbe,et al.  Maximally informative k-itemsets and their efficient discovery , 2006, KDD '06.

[57]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[58]  Jiawei Han,et al.  On compressing frequent patterns , 2007, Data Knowl. Eng..

[59]  George Karypis,et al.  Frequent substructure-based approaches for classifying chemical compounds , 2003, IEEE Transactions on Knowledge and Data Engineering.

[60]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[61]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[62]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[63]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[64]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[65]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[66]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[67]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[68]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[69]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[70]  Albrecht Zimmermann,et al.  Tree2 - Decision Trees for Tree Structured Data , 2005, LWA.

[71]  Taneli Mielikäinen,et al.  Summarization Techniques for Pattern Collections in Data Mining , 2005, ArXiv.

[72]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[73]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[74]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[75]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[76]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[77]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[78]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[79]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[80]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[81]  Bart Goethals,et al.  FP-Bonsai: The Art of Growing and Pruning Small FP-Trees , 2004, PAKDD.

[82]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[83]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..