Perfectly load-balanced, optimal, stable, parallel merge

We present a simple, work-optimal and synchronization-free solution to the problem of stably merging in parallel two given, ordered arrays of m and n elements into an ordered array of m+n elements. The main contribution is a new, simple, fast and direct algorithm that determines, for any prefix of the stably merged output sequence, the exact prefixes of each of the two input sequences needed to produce this output prefix. More precisely, for any given index (rank) in the resulting, but not yet constructed output array representing an output prefix, the algorithm computes the indices (co-ranks) in each of the two input arrays representing the required input prefixes without having to merge the input arrays. The co-ranking algorithm takes O(log min(m,n)) time steps. The algorithm is used to devise a perfectly load-balanced, stable, parallel merge algorithm where each of p processing elements has exactly the same number of input elements to merge. Compared to other approaches to the parallel merge problem, our algorithm is considerably simpler and can be faster up to a factor of two. Compared to previous algorithms for solving the co-ranking problem, the algorithm given here is direct and maintains stability in the presence of repeated elements at no extra space or time cost. When the number of processing elements p does not exceed (m+n)/log min(m,n), the parallel merge algorithm has optimal speedup. It is easy to implement on both shared and distributed memory parallel systems.

[1]  Dilip Sarkar,et al.  Parallel algorithms for merging and sorting , 1991, Inf. Sci..

[2]  Peter J. Varman,et al.  Parallel merging: algorithm and implementation results , 1990, Parallel Comput..

[3]  Torben Hagerup,et al.  Optimal Merging and Sorting on the Erew Pram , 1989, Inf. Process. Lett..

[4]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[5]  Uzi Vishkin,et al.  Finding the maximum, merging and sorting in a parallel computation model , 1981, CONPAR.

[6]  Uzi Vishkin,et al.  Finding the Maximum, Merging, and Sorting in a Parallel Computation Model , 1981, J. Algorithms.

[7]  Jesper Larsson Träff Simplified, stable parallel merging , 2012, ArXiv.

[8]  Peter J. Varman,et al.  Merging Multiple Lists on Hierarchical-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[9]  Selim G. Akl,et al.  Optimal Parallel Merging and Sorting Without Memory Conflicts , 1987, IEEE Transactions on Computers.

[10]  Alexandros V. Gerbessiotis,et al.  Merging on the BSP model , 2001, Parallel Comput..

[11]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[12]  Amit Jain,et al.  An Optimal Parallel Algorithm for Merging Using Multiselection , 1994, Inf. Process. Lett..

[13]  Jesper Larsson Träff,et al.  Efficient MPI Implementation of a Parallel, Stable Merge Algorithm , 2012, EuroMPI.

[14]  Danny Ziyi Chen,et al.  Efficient Parallel Binary Search on Sorted Arrays, with Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[15]  Yitzhak Birk,et al.  Merge Path - Parallel Merging Made Simple , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[16]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[17]  Christos Levcopoulos,et al.  Space-Efficient Parallel Merging , 1992, PARLE.