Shaping SQL-Based Frequent Pattern Mining Algorithms

Integration of data mining and database management systems could significantly ease the process of knowledge discovery in large databases. We consider implementations of frequent itemset mining algorithms, in particular pattern-growth algorithms similar to the top-down FP-growth variations, tightly coupled to relational database management systems. Our implementations remain within the confines of the conventional relational database facilities like tables, indices, and SQL operations. We compare our algorithm to the most promising previously proposed SQL-based FIM algorithm. Experiments show that our method performs better in many cases, but still has severe limitations compared to the traditional stand-alone pattern-growth method implementations. We identify the bottlenecks of our SQL-based pattern-growth methods and investigate the applicability of tightly coupled algorithms in practice.

[1]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[2]  Sharma Chakravarthy,et al.  Partitioned Approach to Association Rule Mining over Multiple Databases , 2004, DaWaK.

[3]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[4]  Ralf Rantzau,et al.  Processing frequent itemset discovery queries by division and set containment join operators , 2003, DMKD '03.

[5]  Kai-Uwe Sattler,et al.  SQL database primitives for decision tree classifiers , 2001, CIKM '01.

[6]  Arun N. Swami,et al.  Set-Oriented Data Mining in relational Databases , 1995, Data Knowl. Eng..

[7]  Ke Wang,et al.  Top Down FP-Growth for Association Rule Mining , 2002, PAKDD.

[8]  Sharma Chakravarthy,et al.  Performance Evaluation and Optimization of Join Queries for Association Rule Mining , 1999, DaWaK.

[9]  Ralf Rantzau Frequent Itemset Discovery with SQL Using Universal Quantification , 2004, Database Support for Data Mining Applications.

[10]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[11]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Fadila Bentayeb,et al.  Decision Tree Modeling with Relational Views , 2002, ISMIS.

[13]  Elena Baralis,et al.  Index support for frequent itemset mining in a relational DBMS , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Masaru Kitsuregawa,et al.  SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UBD EEE) , 2000, DaWaK.

[15]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Andr As A. Bencz Ur,et al.  An Architecture for Mining Massive Web Logs with Experiments , 2003 .

[17]  Wei Li,et al.  Computing Frequent Itemsets Inside Oracle 10G , 2004, VLDB.

[18]  Giuseppe Psaila,et al.  A tightly-coupled architecture for data mining , 1998, Proceedings 14th International Conference on Data Engineering.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Jean-François Boulicaut,et al.  Modeling KDD Processes within the Inductive Database Framework , 1999, DaWaK.

[21]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[22]  Sunita Sarawagi,et al.  Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[23]  Kai-Uwe Sattler,et al.  SQL Based Frequent Pattern Mining with FP-Growth , 2004, INAP/WLP.

[24]  Surajit Chaudhuri,et al.  Integrating data mining with SQL databases: OLE DB for data mining , 2001, Proceedings 17th International Conference on Data Engineering.

[25]  Sharma Chakravarthy,et al.  Performance Evaluation of SQL-OR Variants for Association Rule Mining , 2003, DaWaK.

[26]  Jean-François Boulicaut,et al.  Query Languages Supporting Descriptive Rule Mining: A Comparative Study , 2004, Database Support for Data Mining Applications.