Large Scale Data Mining: Challenges and Responses

Data mining over large data-sets is important due to its obvious commercial potential. However, it is also a major challenge due to its computational complexity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we present some results of our intensive research on parallelising data mining algorithms. In particular, we also present a methodology for determining the proper parallelisatlon strategy based on the idea of algorithmic skeletons and performance modelling. This research aims to provide a systematic way to develop parallel data mining algorithms and applications.

[1]  Moustafa Ghanem,et al.  Structured parallel programming , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[2]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Yike Guo,et al.  Functional Skeletons for Parallel Coordination , 1995, Euro-Par.