Processing frequent itemset discovery queries by division and set containment join operators

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.

[1]  Peter C. Lockemann,et al.  Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS , 2004, Database Support for Data Mining Applications.

[2]  Sven Helmer Performance enhancements for advanced database management systems , 2000 .

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Arun N. Swami,et al.  Set-Oriented Data Mining in relational Databases , 1995, Data Knowl. Eng..

[5]  Hector Garcia-Molina,et al.  Adaptive algorithms for set containment joins , 2003, TODS.

[6]  Ralf Rantzau Frequent Itemset Discovery with SQL Using Universal Quantification , 2004, Database Support for Data Mining Applications.

[7]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[8]  Hector Garcia-Molina,et al.  Divide-and-Conquer Algorithm for Computing Set Containment Joins , 2002, EDBT.

[9]  Sunita Sarawagi,et al.  Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[10]  Rakesh Agrawal,et al.  Storage and Querying of E-Commerce Data , 2001, VLDB.

[11]  Mohammed J. Zaki,et al.  A Requirements Analysis for Parallel KDD Systems , 2000, IPDPS Workshops.

[12]  Jeffrey F. Naughton,et al.  Efficient storage and query processing of set-valued attributes , 2001 .

[13]  Bernhard Seeger,et al.  XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries , 2001, VLDB.

[14]  Masaru Kitsuregawa,et al.  Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison with Directly Coded C Implementation , 1999, PAKDD.

[15]  Masaru Kitsuregawa,et al.  SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UBD EEE) , 2000, DaWaK.

[16]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Sven Helmer,et al.  Compiling Away Set Containment and Intersection Joins , 2002 .

[19]  Goetz Graefe,et al.  Fast algorithms for universal quantification in large databases , 1995, TODS.

[20]  Nikos Mamoulis,et al.  Efficient processing of joins on set-valued attributes , 2003, SIGMOD '03.

[21]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[22]  Sharma Chakravarthy,et al.  Performance Evaluation and Optimization of Join Queries for Association Rule Mining , 1999, DaWaK.

[23]  Jeffrey F. Naughton,et al.  Set Containment Joins: The Good, The Bad and The Ugly , 2000, VLDB.

[24]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[25]  Quan Wang,et al.  Algorithms and applications for universal quantification in relational databases , 2003, Inf. Syst..