A Perspective on Databases and Data Mining

We discuss the use of database methods for data mining. Recently impressive results have been achieved for some data mining problems using highly specialized and clever data structures. We study how well one can manage by using general purpose database management systems. We illustrate our ideas by investigating the use of a dbms for a well-researched area: the discovery of association rules. We present a simple algorithm, consisting of only union and intersection operations, and show that it achieves quite good performance on an efficient dbms. Our method can incorporate inheritance hierarchies to the association rule algorithm easily. We also present a technique that effectively reduces the number of database operations when searching large search spaces that contain only few interesting items. Our work shows that database techniques are promising for data mining: general architectures can achieve reasonable results.

[1]  Martin L. Kersten,et al.  Architectural Support for Data Mining , 1994, KDD Workshop.

[2]  Patrick Valduriez,et al.  A query processing strategy for the decomposed storage model , 1987, 1987 IEEE Third International Conference on Data Engineering.

[3]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Marcel Holsheimer,et al.  Data Surveyor: Searching the Nuggets in Parallel , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Martin L. Kersten,et al.  An Analysis of a Dynamic Query Optimization Scheme for Different Data Distributions , 1991, Query Processing for Advanced Database Systems, Dagstuhl.

[10]  Arno Siebes,et al.  Data surveyor: the nuggets in parallel , 1996, KDD 1996.

[11]  Martin L. Kersten Goblin: a DBPL designed for Advanced Database Applications , 1991, DEXA.