Engineering Parallel String Sorting

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.

[1]  Katsuhiko Kakehi,et al.  Merging String Sequences by Longest Common Prefixes , 2008 .

[2]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[3]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[4]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[5]  Justin Zobel,et al.  Cache-conscious sorting of large sets of strings with dynamic tries , 2004, JEAL.

[6]  Justin Zobel,et al.  Cache-efficient string sorting using copying , 2007, ACM J. Exp. Algorithmics.

[7]  Yi Zhang,et al.  A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000 , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[8]  Peter Sanders,et al.  MCSTL: the multi-core standard template library , 2007, PPOPP.

[9]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[10]  Juha Kärkkäinen,et al.  Engineering Radix Sort for Strings , 2008, SPIRE.

[11]  Moshe Lewenstein,et al.  On Demand String Sorting over Unbounded Alphabets , 2007, CPM.

[12]  Moshe Lewenstein,et al.  Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to Online Indexing , 2014, SIAM J. Comput..

[13]  Kurt Mehlhorn,et al.  Scanning Multiple Sequences via Cache Memory , 2002, Algorithmica.

[14]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[15]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[16]  Peter Sanders,et al.  Engineering a Multi-core Radix Sort , 2011, Euro-Par.

[17]  Arne Andersson,et al.  Implementing radixsort , 1998, JEAL.

[18]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[19]  Sebastian Winkel,et al.  Super Scalar Sample Sort , 2004, ESA.

[20]  Selim G. Akl,et al.  Optimal Parallel Merging and Sorting Without Memory Conflicts , 1987, IEEE Transactions on Computers.

[21]  Anthony Wirth,et al.  Engineering burstsort: Toward fast in-place string sorting , 2010, JEAL.

[22]  Yuan-Chieh Chow,et al.  Optimal Parallel Sorting Scheme by Order Statistics , 1987, SIAM J. Comput..

[23]  Peter Sanders,et al.  Parallel String Sample Sort , 2013, ESA.

[24]  Katsuhiko Kakehi,et al.  Cache Efficient Radix Sort for String Sorting , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[25]  Peter Sanders,et al.  Engineering a Sorted List Data Structure for 32 Bit Key , 2004, ALENEX/ANALC.

[26]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[27]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[28]  Peter Sanders,et al.  Fast priority queues for cached memory , 1999, JEAL.

[29]  Keith Bostic,et al.  Engineering Radix Sort , 1993, Comput. Syst..

[30]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[31]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[32]  S StoneHarold,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973 .

[33]  Torben Hagerup Optimal parallel string algorithms: sorting, merging and computing the minimum , 1994, STOC '94.