Wavelet trees for all

The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, full-text indexes, XML indexes, and general numeric sequences.

[1]  Mihai Patrascu Lower bounds for 2-dimensional range counting , 2007, STOC '07.

[2]  Rajeev Raman,et al.  On the Size of Succinct Indices , 2007, ESA.

[3]  Christos Makris,et al.  Wavelet trees: A survey , 2012, Comput. Sci. Inf. Syst..

[4]  Gonzalo Navarro,et al.  The Wavelet Matrix , 2012, SPIRE.

[5]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[6]  Juha Kärkkäinen,et al.  Fixed Block Compression Boosting in FM-Indexes , 2011, SPIRE.

[7]  David Richard Clark,et al.  Compact pat trees , 1998 .

[8]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[9]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[10]  Patrick K. Nicholson,et al.  Space Efficient Wavelet Tree Construction , 2011, SPIRE.

[11]  Gonzalo Navarro,et al.  On compressing permutations and adaptive sorting , 2011, Theor. Comput. Sci..

[12]  Raffaele Giancarlo,et al.  Boosting textual compression in optimal linear time , 2005, JACM.

[13]  Diego Arroyuelo,et al.  Compressed Self-indices Supporting Conjunctive Queries on Document Collections , 2010, SPIRE.

[14]  Gonzalo Navarro,et al.  Compressed Representation of Web and Social Networks via Dense Subgraphs , 2012, SPIRE.

[15]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[16]  Roberto Grossi,et al.  When indexing equals compression: experiments with compressing suffix arrays and applications , 2004, SODA '04.

[17]  Timothy M. Chan,et al.  Orthogonal range searching on the RAM, revisited , 2011, SoCG '11.

[18]  Gonzalo Navarro,et al.  New Lower and Upper Bounds for Representing Sequences , 2011, ESA.

[19]  Gonzalo Navarro,et al.  Succinct Suffix Arrays based on Run-Length Encoding , 2005, Nord. J. Comput..

[20]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[21]  Rodrigo González,et al.  Compressed Text Indexes with Fast Locate , 2007, CPM.

[22]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[23]  Juha Kärkkäinen,et al.  Counting Colours in Compressed Strings , 2011, CPM.

[24]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[25]  S. Srinivasa Rao,et al.  Rank/select operations on large alphabets: a tool for text indexing , 2006, SODA '06.

[26]  James S. Walker,et al.  A Primer on Wavelets and Their Scientific Applications , 1999 .

[27]  Paolo Ferragina,et al.  Text Compression , 2009, Encyclopedia of Database Systems.

[28]  Gonzalo Navarro,et al.  Space-Efficient Data-Analysis Queries on Grids , 2011, ISAAC.

[29]  Enno Ohlebusch,et al.  Computing the longest common prefix array based on the Burrows-Wheeler transform , 2011, J. Discrete Algorithms.

[30]  James S. Walker,et al.  A Primer on Wavelets and Their Scientific Applications, Second Edition , 2008 .

[31]  Ian H. Witten,et al.  Managing gigabytes 2nd edition , 1999 .

[32]  Gonzalo Navarro,et al.  Reorganizing compressed text , 2008, SIGIR '08.

[33]  Enno Ohlebusch,et al.  Bidirectional Search in a String with Wavelet Trees , 2010, CPM.

[34]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[35]  Gonzalo Navarro,et al.  Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections , 2008, SPIRE.

[36]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[37]  Gonzalo Navarro,et al.  Fast, Small, Simple Rank/Select on Bitmaps , 2012, SEA.

[38]  Raffaele Giancarlo,et al.  The myriad virtues of Wavelet Trees , 2009, Inf. Comput..

[39]  Gonzalo Navarro,et al.  Extended Compact Web Graph Representations , 2010, Algorithms and Applications.

[40]  Wing-Kai Hon,et al.  Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing , 2008, Data Compression Conference (dcc 2008).

[41]  Wing-Kai Hon,et al.  Efficient Data Structures for the Orthogonal Range Successor Problem , 2009, COCOON.

[42]  Gonzalo Navarro,et al.  Position-Restricted Substring Searching , 2006, LATIN.

[43]  Gonzalo Navarro,et al.  Entropy-bounded representation of point grids , 2010, Comput. Geom..

[44]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[45]  Gonzalo Navarro,et al.  Alphabet Partitioning for Compressed Rank/Select and Applications , 2010, ISAAC.

[46]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[47]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[48]  Gonzalo Navarro,et al.  Colored range queries and document retrieval , 2010, Theor. Comput. Sci..

[49]  Gonzalo Navarro,et al.  Efficient Fully-Compressed Sequence Representations , 2012, Algorithmica.

[50]  Gonzalo Navarro,et al.  Practical Compressed Document Retrieval , 2011, SEA.

[51]  Gonzalo Navarro,et al.  Optimal Dynamic Sequence Representations , 2014, SIAM J. Comput..

[52]  Gonzalo Navarro,et al.  Alphabet-Independent Compressed Text Indexing , 2011, TALG.

[53]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[54]  Giuseppe Ottaviano,et al.  The wavelet trie: maintaining an indexed sequence of strings in compressed space , 2012, PODS '12.

[55]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[56]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[57]  Gonzalo Navarro,et al.  Compact binary relation representations with rich functionality , 2012, Inf. Comput..

[58]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[59]  German Tischler On Wavelet Tree Construction , 2011, CPM.

[60]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[61]  Gonzalo Navarro,et al.  Self-Indexed Grammar-Based Compression , 2011, Fundam. Informaticae.

[62]  Gonzalo Navarro,et al.  Practical Rank/Select Queries over Arbitrary Sequences , 2008, SPIRE.

[63]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[64]  Nieves R. Brisaboa,et al.  A compressed self-indexed representation of XML documents , 2010, JISBD.

[65]  Gonzalo Navarro,et al.  Indexing text using the Ziv-Lempel trie , 2002, J. Discrete Algorithms.

[66]  Prosenjit Bose,et al.  Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing , 2009, WADS.

[67]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[68]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[69]  Simon J. Puglisi,et al.  Range Quantile Queries: Another Virtue of Wavelet Trees , 2009, SPIRE.

[70]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[71]  Gonzalo Navarro,et al.  A Fun Application of Compact Data Structures to Indexing Geographic Data , 2010, FUN.

[72]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[73]  Roberto Grossi,et al.  Wavelet Trees: From Theory to Practice , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[74]  Simon Gog,et al.  Compressed suffix trees: design, construction, and applications , 2011 .

[75]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[76]  Gonzalo Navarro,et al.  Fast and Compact Prefix Codes , 2010, SOFSEM.

[77]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[78]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[79]  Alejandro López-Ortiz,et al.  An experimental investigation of set intersection algorithms for text searching , 2010, JEAL.

[80]  Gonzalo Navarro,et al.  Implicit Compression Boosting with Applications to Self-indexing , 2007, SPIRE.

[81]  Gonzalo Navarro,et al.  Lightweight natural language text compression , 2006, Information Retrieval.

[82]  Emanuele Viola,et al.  Cell-probe lower bounds for succinct partial sums , 2010, SODA '10.

[83]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[84]  Johannes Fischer,et al.  Optimal Succinctness for Range Minimum Queries , 2008, LATIN.

[85]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[86]  Gonzalo Navarro,et al.  An Alphabet-Friendly FM-Index , 2004, SPIRE.

[87]  Allan Grønlund Jørgensen,et al.  Range selection and median: tight cell probe lower bounds and adaptive data structures , 2011, SODA '11.

[88]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[89]  R. Haddad,et al.  Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets , 1992 .

[90]  Veli Mäkinen,et al.  Space-Efficient Algorithms for Document Retrieval , 2007, CPM.

[91]  Gonzalo Navarro,et al.  Self-indexing Based on LZ77 , 2011, CPM.

[92]  J. Shane Culpepper,et al.  Top-k Ranked Document Search in General Text Databases , 2010, ESA.

[93]  Gonzalo Navarro,et al.  Dual-Sorted Inverted Lists , 2010, SPIRE.

[94]  Gonzalo Navarro,et al.  Compact Rich-Functional Binary Relation Representations , 2010, LATIN.

[95]  J. Ian Munro,et al.  Succinct Representations of Dynamic Strings , 2010, SPIRE.

[96]  Joseph JáJá,et al.  Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting , 2004, ISAAC.

[97]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[98]  Gonzalo Navarro,et al.  New algorithms on wavelet trees and applications to information retrieval , 2010, Theor. Comput. Sci..

[99]  Alexander Golynski Optimal lower bounds for rank and select indexes , 2007, Theor. Comput. Sci..

[100]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[101]  Wing-Kai Hon,et al.  Compressed indexes for dynamic text collections , 2007, TALG.

[102]  Gonzalo Navarro,et al.  Compressed Representations of Permutations, and Applications , 2009, STACS.

[103]  John L. Smith Tables , 1969, Neuromuscular Disorders.