Succinct Text Indexes on Large Alphabet

In this paper, we first consider some properties of strings who have the same suffix array. Next, we design a data structure to support rank and select operations on an alphabet Σ using nlog|Σ| + (nlog|Σ|) bits in O(log|Σ|) time for a text of length n. It also supports an extended rank, namely rank≤, such that rank$^{\rm \leq}_{\alpha}$(T,i) returns the number of letters which are smaller than α in string T, plus the number of αs up to position i. Also, it runs in O(log|Σ|) time. By this structure, we implement the DAWG succinctly. The main structure only takes nlog|Σ| + o(nlog|Σ|) bits and supports basic operations of DAWG efficiently.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[4]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[5]  S. Srinivasa Rao,et al.  A categorization theorem on suffix arrays with applications to space efficient text indexes , 2005, SODA '05.

[6]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[7]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[8]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[10]  Maxime Crochemore,et al.  Automata for Matching Patterns , 1997, Handbook of Formal Languages.

[11]  Ayumi Shinohara,et al.  Inferring Strings from Graphs and Arrays , 2003, MFCS.

[12]  Kunihiko Sadakane,et al.  Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array , 2000, ISAAC.

[13]  Enno Ohlebusch,et al.  Optimal Exact Strring Matching Based on Suffix Arrays , 2002, SPIRE.

[14]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[15]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[16]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[17]  Venkatesh Raman,et al.  Succinct representation of balanced parentheses, static trees and planar graphs , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[18]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.