Using trees to mine multirelational databases

This paper proposes a new approach to mine multirelational databases. Our approach is based on the representation of multirelational databases as sets of trees, for which we propose two alternative representation schemes. Tree mining techniques can thus be applied as the basis for multirelational data mining techniques, such as multirelational classification or multirelational clustering. We analyze the differences between identifying induced and embedded tree patterns in the proposed tree-based representation schemes and we study the relationships among the sets of tree patterns that can be discovered in each case. This paper also describes how these frequent tree patterns can be used, for instance, to mine association rules in multirelational databases.

[1]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[2]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[4]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[5]  Daniel Sánchez,et al.  ART: A Hybrid Classification Model , 2004, Machine Learning.

[6]  Amy McGovern,et al.  Spatiotemporal Relational Probability Trees: An Introduction , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Daniel Sánchez,et al.  Measuring the accuracy and interest of association rules: A new framework , 2002, Intell. Data Anal..

[8]  Ansaf Salleb-Aouissi,et al.  Learning Characteristic Rules Relying on Quantified Paths , 2003, PKDD.

[9]  Ivar Jacobson,et al.  Unified Modeling Language User Guide, The (2nd Edition) (Addison-Wesley Object Technology Series) , 2005 .

[10]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.

[11]  Anthony J. T. Lee,et al.  An efficient algorithm for mining frequent inter-transaction patterns , 2007, Inf. Sci..

[12]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[13]  Ashwin Srinivasan,et al.  Warmr: a data mining tool for chemical data , 2001, J. Comput. Aided Mol. Des..

[14]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[16]  Anthony K. H. Tung,et al.  Efficient Mining of Intertransaction Association Rules , 2003, IEEE Trans. Knowl. Data Eng..

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[19]  Jian Pei,et al.  Constrained frequent pattern mining: a pattern-growth view , 2002, SKDD.

[20]  Fernando Berzal Galiano,et al.  Frequent tree pattern mining: A survey , 2010, Intell. Data Anal..

[21]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[22]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[23]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[24]  S. H. MuggletonOxford Mutagenesis : ILP experiments in a non - , 1994 .

[25]  Héctor Ariel Leiva,et al.  MRDTL: A multi-relational decision tree learning algorithm , 2002 .

[26]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[27]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[28]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[29]  J. Cubero,et al.  POTMiner: mining ordered, unordered, and partially-ordered trees , 2010 .

[30]  Stefan Edlich,et al.  The definitive guide to db4o , 2006 .

[31]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[32]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[33]  David Maier,et al.  Maximal objects and the semantics of universal relation databases , 1983, TODS.

[34]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[35]  David Maier,et al.  On the foundations of the universal relation model , 1984, TODS.

[36]  Foster J. Provost,et al.  Distribution-based aggregation for relational learning with identifier attributes , 2006, Machine Learning.

[37]  Jeroen De Knijf,et al.  FAT-miner: mining frequent attribute trees , 2007, SAC '07.

[38]  강문설 [서평]「The Unified Modeling Language User Guide」 , 1999 .

[39]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[40]  Roberto J. Bayardo The Hows, Whys, and Whens of Constraints in Itemset and Rule Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[41]  Ronald Fagin,et al.  A simplied universal relation assumption and its properties , 1982, TODS.

[42]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[43]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[44]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .