Data mining: a tightly-coupled implementation on a parallel database server

Due to the increasingly difficulty of discovering patterns in real-world databases using only conventional OLAP tools, an automated process such as data mining is currently essential. As data mining over large data sets can take a prohibitive amount of time related to the computational complexity of the algorithms, parallel processing has often been used as a solution. However, when data does not fit in memory, some solutions do not apply and a database system may be required rather than flat files. Most implementations use a database system loosely-coupled with the data mining algorithms. We address the data consuming activities through parallel processing and data fragmentation on the database server, providing a tight integration with data mining techniques. Experimental results show that the potential benefits of this integration were obtained, despite the difficulties of processing a complex application.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[3]  David J. DeWitt,et al.  Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[9]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[10]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[11]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[12]  Gary Hallmark Oracle parallel warehouse server , 1997, Proceedings 13th International Conference on Data Engineering.

[13]  Kyuseok Shim,et al.  Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[14]  Marcel Holsheimer,et al.  Data Surveyor: Searching the Nuggets in Parallel , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[16]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[17]  Marta Mattoso,et al.  Data Mining on Parallel Database Systems , 1998 .