Linear time construction of compressed text indices in compact space

We show that the compressed suffix array and the compressed suffix tree for a string of length n over an integer alphabet of size σ ≤ n can both be built in O(n) (randomized) time using only O(n log σ) bits of working space. The previously fastest construction algorithms that used O(n log σ) bits of space took times O(n log log σ) and O(n logε n) respectively (where ε is any positive constant smaller than 1).

[1]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[2]  S. Srinivasa Rao Time-space trade-offs for compressed suffix arrays , 2002, Inf. Process. Lett..

[3]  Enno Ohlebusch,et al.  Bidirectional search in a string with wavelet trees and bidirectional matching statistics , 2012, Inf. Comput..

[4]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[5]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[6]  Enno Ohlebusch,et al.  Computing the longest common prefix array based on the Burrows-Wheeler transform , 2011, J. Discrete Algorithms.

[7]  Rajeev Raman,et al.  More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries , 2009, STACS.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[10]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[11]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[12]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[13]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[14]  Johannes Fischer,et al.  Optimal Succinctness for Range Minimum Queries , 2008, LATIN.

[15]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[16]  Sebastiano Vigna,et al.  Monotone minimal perfect hashing: searching a sorted table with O(1) accesses , 2009, SODA.

[17]  Johannes Fischer Combined data structure for previous- and next-smaller-values , 2011, Theor. Comput. Sci..

[18]  Juha Kärkkäinen,et al.  Versatile Succinct Representations of the Bidirectional Burrows-Wheeler Transform , 2013, ESA.

[19]  Gonzalo Navarro,et al.  Improved compressed indexes for full-text document retrieval , 2013, J. Discrete Algorithms.

[20]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[21]  Rajeev Raman,et al.  Succinct Representations of Permutations , 2003, ICALP.

[22]  Dan E. Willard,et al.  Log-logarithmic worst-case range queries are possible in space ⊕(N) , 1983 .

[23]  Dan E. Willard Log-Logarithmic Worst-Case Range Queries are Possible in Space Theta(N) , 1983, Inf. Process. Lett..

[24]  Gonzalo Navarro,et al.  Fully-functional succinct trees , 2010, SODA '10.

[25]  Enno Ohlebusch,et al.  Space-Efficient Computation of Maximal and Supermaximal Repeats in Genome Sequences , 2012, SPIRE.

[26]  David Richard Clark,et al.  Compact pat trees , 1998 .

[27]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[28]  Wing-Kai Hon,et al.  Breaking a Time-and-Space Barrier in Constructing Full-Text Indices , 2009, SIAM J. Comput..

[29]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[30]  Torben Hagerup,et al.  Efficient Minimal Perfect Hashing in Nearly Minimal Space , 2001, STACS.

[31]  V. Vinay,et al.  Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science , 1996 .

[32]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[33]  Kunihiko Sadakane,et al.  A Linear-Time Burrows-Wheeler Transform Using Induced Sorting , 2009, SPIRE.

[34]  Rajeev Raman,et al.  On the Size of Succinct Indices , 2007, ESA.

[35]  Dong Kyue Kim,et al.  Constructing suffix arrays in linear time , 2005, J. Discrete Algorithms.

[36]  Kunihiko Sadakane,et al.  Succinct data structures for flexible text retrieval systems , 2007, J. Discrete Algorithms.

[37]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[38]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[39]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[40]  S. Muthukrishnan,et al.  Perfect Hashing for Strings: Formalization and Algorithms , 1996, CPM.

[41]  Gregory Kucherov,et al.  Cross-Document Pattern Matching , 2012, CPM.

[42]  Gonzalo Navarro,et al.  Alphabet-Independent Compressed Text Indexing , 2011, TALG.

[43]  Wing-Kai Hon,et al.  Space-Economical Algorithms for Finding Maximal Unique Matches , 2002, CPM.

[44]  S. Srinivasa Rao,et al.  Rank/select operations on large alphabets: a tool for text indexing , 2006, SODA '06.

[45]  Rajeev Raman,et al.  Optimal Trade-Offs for Succinct String Indexes , 2010, ICALP.

[46]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[47]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[48]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.