论文信息 - Data mining: a tightly-coupled implementation on a parallel database server

Data mining: a tightly-coupled implementation on a parallel database server

Due to the increasingly difficulty of discovering patterns in real-world databases using only conventional OLAP tools, an automated process such as data mining is currently essential. As data mining over large data sets can take a prohibitive amount of time related to the computational complexity of the algorithms, parallel processing has often been used as a solution. However, when data does not fit in memory, some solutions do not apply and a database system may be required rather than flat files. Most implementations use a database system loosely-coupled with the data mining algorithms. We address the data consuming activities through parallel processing and data fragmentation on the database server, providing a tight integration with data mining techniques. Experimental results show that the potential benefits of this integration were obtained, despite the difficulties of processing a complex application.

Marta Mattoso | Nelson F. F. Ebecken | Mauro Sousa

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Jorma Rissanen,et al. SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[3] David J. DeWitt,et al. Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[4] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[5] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[6] Usama M. Fayyad,et al. Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] Alex Alves Freitas,et al. Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[9] Tomasz Imielinski,et al. An Interval Classifier for Database Mining Applications , 1992, VLDB.

[10] Ramakrishnan Srikant,et al. The Quest Data Mining System , 1996, KDD.

[11] Jiawei Han,et al. DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[12] Gary Hallmark. Oracle parallel warehouse server , 1997, Proceedings 13th International Conference on Data Engineering.

[13] Kyuseok Shim,et al. Developing Tightly-Coupled Data Mining Applications on a Relational Database System , 1996, KDD.

[14] Marcel Holsheimer,et al. Data Surveyor: Searching the Nuggets in Parallel , 1996, Advances in Knowledge Discovery and Data Mining.

[15] Rakesh Agrawal,et al. SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[16] Shamkant B. Navathe,et al. Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[17] Marta Mattoso,et al. Data Mining on Parallel Database Systems , 1998 .