Integrating Association Rule Mining Algorithms with Relational Database Systems

Mining for association rules is one of the fundamental data mining methods. In this paper we describe how to efficiently integrate association rule mining algorithms with relational database systems. From our point of view direct access of the algorithms to the database system is a basic requirement when transferring data mining technology into daily operation. This is especially true in the context of large data warehouses, where exporting the mining data and preparing it outside the database system becomes annoying or even infeasible. The development of our own approach is mainly motivated by shortcomings of current solutions. We investigate the most challenging problems by contrasting the prototypical but somewhat academic association mining scenario from basket analysis with a real-world application. We thoroughly compile the requirements arising from mining an operative data warehouse at DaimlerChrysler. We generalize the requirements and address them by developing our own approach. We explain its basic design and give the details behind our implementation. Based on the warehouse, we evaluate our own approach together with commercial mining solutions. It turns out that regarding runtime and scalability we clearly outperform the commercial tools accessible to us. More important, our new approach supports mining tasks that are not directly addressable by commercial mining solutions.

[1]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[2]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[3]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[4]  Guido Lindner,et al.  Analysing Warranty Claims of Automobiles; An Application Description Following the CRISP-DM Data Mining Process , 1999, ICSC.

[5]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[8]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[9]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[15]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[17]  Donald D. Chamberlin,et al.  A Complete Guide to DB2 Universal Database , 1998 .

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[19]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[20]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[21]  Ulrich Güntzer,et al.  Mining Association Rules: Deriving a Superior Algorithm by Analyzing Today's Approaches , 2000, PKDD.

[22]  Rüdiger Wirth,et al.  CRISP-DM: Towards a Standard Process Model for Data Mining , 2000 .

[23]  Kyuseok Shim,et al.  Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[24]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[25]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..