Improved Parallel Construction of Wavelet Trees and Rank/Select Structures

Existing parallel algorithms for wavelet tree construction have a work complexity of O(n log σ). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either O(n log log n [log σ/√ log n log log n]) work and polylogarithmic depth, or O(n [log σ/√ log n]) work and sub-linear depth. We also describe another algorithm that has O(n [log σ/√ log n]) work and O(σ + log n) depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.

[1]  Rajeev Raman,et al.  The Power of Collision: Randomized Parallel Algorithms for Chaining and Integer Sorting , 1990, FSTTCS.

[2]  David Richard Clark,et al.  Compact pat trees , 1998 .

[3]  Leo Ferres,et al.  Efficient Wavelet Tree Construction and Querying for Multicore Architectures , 2014, SEA.

[4]  Gonzalo Navarro,et al.  Wavelet trees for all , 2012, J. Discrete Algorithms.

[5]  Johannes Fischer,et al.  Simple, Fast and Lightweight Parallel Wavelet Tree Construction , 2017, ALENEX.

[6]  Guy E. Blelloch,et al.  Parallel lightweight wavelet tree, suffix array and FM-index construction , 2017, J. Discrete Algorithms.

[7]  Uzi Vishkin,et al.  Parallel algorithms for Burrows-Wheeler compression and decompression , 2014, Theor. Comput. Sci..

[8]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[9]  Christos Makris,et al.  Wavelet trees: A survey , 2012, Comput. Sci. Inf. Syst..

[10]  Gonzalo Navarro,et al.  The Wavelet Matrix , 2012, SPIRE.

[11]  Gonzalo Navarro,et al.  Practical Rank/Select Queries over Arbitrary Sequences , 2008, SPIRE.

[12]  Krzysztof Diks,et al.  Improved Deterministic Parallel Integer Sorting , 1991, Inf. Comput..

[13]  Alexandru I. Tomescu,et al.  Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing , 2015 .

[14]  Julian Shun,et al.  Parallel Wavelet Tree Construction , 2014, 2015 Data Compression Conference.

[15]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[16]  Roberto Grossi,et al.  When indexing equals compression: experiments with compressing suffix arrays and applications , 2004, SODA '04.

[17]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .

[18]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[19]  Jeffrey Scott Vitter,et al.  Fast Construction of Wavelet Trees , 2014, SPIRE.

[20]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[21]  Uzi Vishkin,et al.  Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques , 2008 .

[22]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[23]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[24]  Maxim A. Babenko,et al.  Wavelet Trees Meet Suffix Trees , 2015, SODA.

[25]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[26]  Leo Ferres,et al.  Parallel construction of wavelet trees on multicore architectures , 2016, Knowledge and Information Systems.