Parallel Wavelet Tree Construction

We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show that on a 40-core machine with two-way hyper-threading, we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.

[1]  Christos Makris,et al.  Wavelet trees: A survey , 2012, Comput. Sci. Inf. Syst..

[2]  Jeffrey Scott Vitter,et al.  Fast Construction of Wavelet Trees , 2014, SPIRE.

[3]  Rasmus Pagh Low Redundancy in Static Dictionaries with Constant Query Time , 2001, SIAM J. Comput..

[4]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[5]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[6]  Gonzalo Navarro,et al.  The Wavelet Matrix , 2012, SPIRE.

[7]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[8]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[9]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[10]  Mauricio Marín,et al.  Distributed search based on self-indexed compressed text , 2012, Inf. Process. Manag..

[11]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[12]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[13]  Rajeev Raman,et al.  The Power of Collision: Randomized Parallel Algorithms for Chaining and Integer Sorting , 1990, FSTTCS.

[14]  Leo Ferres,et al.  Efficient Wavelet Tree Construction and Querying for Multicore Architectures , 2014, SEA.

[15]  Giuseppe Ottaviano,et al.  The wavelet trie: maintaining an indexed sequence of strings in compressed space , 2012, PODS '12.

[16]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[17]  Roberto Grossi,et al.  When indexing equals compression: experiments with compressing suffix arrays and applications , 2004, SODA '04.

[18]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[19]  German Tischler On Wavelet Tree Construction , 2011, CPM.

[20]  Gonzalo Navarro,et al.  On Self-Indexing Images - Image Compression with Added Value , 2008, Data Compression Conference (dcc 2008).

[21]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[22]  Maxim A. Babenko,et al.  Wavelet Trees Meet Suffix Trees , 2015, SODA.

[23]  Dong Zhou,et al.  Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences , 2013, SEA.

[24]  David Richard Clark,et al.  Compact pat trees , 1998 .

[25]  Patrick K. Nicholson,et al.  Space Efficient Wavelet Tree Construction , 2011, SPIRE.

[26]  J. Ian Munro,et al.  Membership in Constant Time and Almost-Minimum Space , 1999, SIAM J. Comput..

[27]  Roberto Grossi,et al.  Wavelet Trees: From Theory to Practice , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[28]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[29]  Simon Gog,et al.  Optimized succinct data structures for massive data , 2014, Softw. Pract. Exp..

[30]  Gonzalo Navarro,et al.  Practical Rank/Select Queries over Arbitrary Sequences , 2008, SPIRE.

[31]  Krzysztof Diks,et al.  Improved Deterministic Parallel Integer Sorting , 1991, Inf. Comput..

[32]  Gonzalo Navarro Wavelet trees for all , 2014, J. Discrete Algorithms.

[33]  Uzi Vishkin,et al.  Parallel algorithms for Burrows-Wheeler compression and decompression , 2014, Theor. Comput. Sci..