Integer sorting on shared-memory vector parallel computers

This paper describes new fast integer sorting methods for single vector and shared-memory parallel vector computers, based on the bucket sort algorithm. Existing vectorization methods for bucket sort have made great efforts to avoid store conflicts of vector scatter operations, and therefore are not so efftcient. The vectorization methods shown in this paper-the retry method, the split vector method and the mask vector method-all actively utilize the nature of the store conflicts to achieve high performance. The parallelization method in this paper uses a feature of shared-memory machines and dynamically changes the partitioning of histogram arrays without any overhead. By combining the retry and the parallelization methods, we got the worlds fastest results for the IS program (Class B) in the NAS Parallel Benchmarks on the NBC $X4. Our methods are also applicable to a wide range of particle simulation programs.