A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters
暂无分享,去创建一个
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.
[1] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[2] Guy E. Blelloch,et al. An Experimental Analysis of Parallel Sorting Algorithms , 1998, Theory of Computing Systems.
[3] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[4] Norbert Luttenberger,et al. Fast In-Place Sorting with CUDA Based on Bitonic Sort , 2009, PPAM.