Mining a large database with a parallel database server

Data mining is a data-intensive computation activity. Parallel processing has often been used in data mining algorithms. However, when data do not fit in memory, some solutions do not apply and a database system may be required rather than flat files. Most of the implementations use the database system loosely coupled with the data mining techniques. Hence, the database system only issues queries to be processed on the client machine. In this work, we address the data consuming activities through parallel processing on a database server providing a tight integration with data mining techniques. Experimental results showing the potential benefits of this integration were obtained. Despite the difficulties in processing a complex application, we extracted rules and obtained high performance on all the data-intensive activities such as the construction of the decision tree, pruning and rule extraction.

[1]  Kyuseok Shim,et al.  Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[4]  Surajit Chaudhuri,et al.  On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases , 1998, KDD.

[5]  Marcel Holsheimer,et al.  Data Surveyor: Searching the Nuggets in Parallel , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[7]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[8]  Marta Mattoso,et al.  Data mining: a tightly-coupled implementation on a parallel database server , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[9]  Ananth Grama,et al.  Data Mining: From Serendipity to Science - Guest Editors' Introduction , 1999, Computer.

[10]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[13]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[14]  Gary Hallmark Oracle parallel warehouse server , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  Surajit Chaudhuri Data Mining and Database Systems: Where is the Intersection? , 1998, IEEE Data Eng. Bull..

[16]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[17]  Arno Siebes,et al.  Data surveyor: the nuggets in parallel , 1996, KDD 1996.

[18]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[19]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.