Parallelization of a Hierarchical Data Clustering Algorithm Using OpenMP

This paper presents a parallel implementation of CURE, an efficient hierarchical data clustering algorithm, using the OpenMP programming model. OpenMP provides a means of transparent management of the asymmetry and non-determinism in CURE, while our OpenMP runtime support enables the effective exploitation of the irregular nested loop-level parallelism. Experimental results for various problem parameters demonstrate the scalability of our implementation and the effective utilization of parallel hardware, which enable the use of CURE for large data sets.

[1]  Kilian Stoffel,et al.  Parallel k/h-Means Clustering for Large Data Sets , 1999, Euro-Par.

[2]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[3]  Laurent Amsaleg,et al.  Portable Support and Exploitation of Nested Parallelism in OpenMP , 2006 .

[4]  Massimo Coppola,et al.  Experiments in Parallel Clustering with DBSCAN , 2001, Euro-Par.

[5]  Alok N. Choudhary,et al.  A scalable parallel subspace clustering algorithm for massive data sets , 2000, Proceedings 2000 International Conference on Parallel Processing.

[6]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[7]  Rizos Sakellariou,et al.  Euro-Par 2001 Parallel Processing , 2001, Lecture Notes in Computer Science.

[8]  Josva Kleist,et al.  Migration = cloning; aliasing , 1999 .

[9]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[10]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[11]  Eleftherios D. Polychronopoulos,et al.  A Modular OpenMP Implementation for Clusters of Multiprocessors , 2002, Scalable Comput. Pract. Exp..

[12]  Domenico Talia,et al.  P-AutoClass: Scalable Parallel Clustering for Mining Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..