Fast and Simple Parallel Wavelet Tree and Matrix Construction

The wavelet tree (Grossi et al. [SODA, 2003]) and wavelet matrix (Claude et al. [Inf. Syst., 47:15--32, 2015]) are compact indices for texts over an alphabet $[0,\sigma)$ that support rank, select and access queries in $O(\lg \sigma)$ time. We first present new practical sequential and parallel algorithms for wavelet matrix construction. Their unifying characteristics is that they construct the wavelet matrix bottom-up, i.e., they compute the last level first. We also show that this bottom-up construction can easily be adapted to wavelet trees. In practice, our best sequential algorithm is up to twice as fast as the currently fastest sequential construction algorithm (serialWT), simultaneously saving a factor of 2 in space. On 4 cores, our best parallel algorithm is at least twice as fast as the currently fastest parallel algorithm (recWT), while also using less space. This scales up to 32 cores, where we are about equally fast as recWT, but still use only about 75% of the space. An additional theoretical result shows how to adapt any wavelet tree construction algorithm to the wavelet matrix in the same (asymptotic) time, using only little extra space.

[1]  German Tischler On Wavelet Tree Construction , 2011, CPM.

[2]  Leo Ferres,et al.  Parallel construction of wavelet trees on multicore architectures , 2016, Knowledge and Information Systems.

[3]  Jeffrey Scott Vitter,et al.  Fast Construction of Wavelet Trees , 2014, SPIRE.

[4]  Maxim A. Babenko,et al.  Wavelet Trees Meet Suffix Trees , 2015, SODA.

[5]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[6]  Gonzalo Navarro,et al.  The wavelet matrix: An efficient wavelet tree for large alphabets , 2015, Inf. Syst..

[7]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[8]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[9]  Gonzalo Navarro,et al.  Wavelet trees for all , 2012, J. Discrete Algorithms.

[10]  Gonzalo Navarro,et al.  Position-Restricted Substring Searching , 2006, LATIN.

[11]  Leo Ferres,et al.  Efficient Wavelet Tree Construction and Querying for Multicore Architectures , 2014, SEA.

[12]  Julian Shun,et al.  Improved Parallel Construction of Wavelet Trees and Rank/Select Structures , 2016, 2017 Data Compression Conference (DCC).

[13]  Paulo G. S. da Fonseca,et al.  Online Construction of Wavelet Trees , 2017, SEA.

[14]  Roberto Grossi,et al.  Wavelet Trees: From Theory to Practice , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[15]  Christos Makris,et al.  Wavelet trees: A survey , 2012, Comput. Sci. Inf. Syst..

[16]  Raffaele Giancarlo,et al.  The myriad virtues of Wavelet Trees , 2009, Inf. Comput..

[17]  Julian Shun,et al.  Parallel Wavelet Tree Construction , 2014, 2015 Data Compression Conference.

[18]  Patrick K. Nicholson,et al.  Space Efficient Wavelet Tree Construction , 2011, SPIRE.

[19]  Guy E. Blelloch,et al.  Parallel lightweight wavelet tree, suffix array and FM-index construction , 2017, J. Discrete Algorithms.

[20]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..