Efficient mining for association rules with relational database systems

With the tremendous growth of large scale data repositories, a need for integrating the exploratory techniques of data mining with the capabilities of relational systems to efficiently handle large volumes of data has now risen. We look at the performance of the most prevalent association rule mining algorithm-Apriori with IBM's DB2 Universal Database system. We show that a multi-column (MC) data model is preferable over the commonly used single column (SC) data model for association rule mining. We obtain factors of 4.8 to 6 improvement in performance for the MC data model over commercial implementations for the SC data model. We provide a new relational operator called Combinations, for efficient SQL implementation of Apriori in the database engine-this results in trivial parallelizability, reliability, and portability for the mining application.

[1]  Donald D. Chamberlin,et al.  Using the New DB2: IBM's Object-Relational Database System , 1996 .

[2]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[3]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[4]  Kyuseok Shim,et al.  Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[5]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[6]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[7]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[10]  Laks V. S. Lakshmanan,et al.  Tables as a paradigm for querying and restructuring (extended abstract) , 1996, PODS '96.

[11]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[12]  Hamid Pirahesh,et al.  SQL open heterogeneous data access , 1998, SIGMOD '98.

[13]  Jiawei Han,et al.  DBMiner: interactive mining of multiple-level knowledge in relational databases , 1996, SIGMOD '96.

[14]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[15]  Martin L. Kersten,et al.  Monet And Its Geographic Extensions: A Novel Approach to High Performance GIS Processing , 1996, EDBT.

[16]  Tomasz Imielinski,et al.  DataMine: Application Programming Interface and Query Language for Database Mining , 1996, KDD.

[17]  Martin L. Kersten,et al.  Architectural Support for Data Mining , 1994, KDD Workshop.

[18]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.