Succinct representations of lcp information and improvements in the compressed suffix arrays

We introduce two succinct data structures to solve various string problems. One is for storing the information of <i>lcp,</i> the longest common prefix, between suffixes in the suffix array, and the other is an improvement in the compressed suffix array which supports linear time counting queries for any pattern. The former occupies only 2<i>n</i> + <i>o</i>(<i>n</i>) bits for a text of length <i>n</i> for computing <i>lcp</i> between adjacent suffixes in lexicographic order in constant time, and 6<i>n</i> + <i>o</i>(<i>n</i>) bits between any two suffixes. No data structure in the literature attained linear size. The latter has size proportional to the text size and it is applicable to texts on any alphabet Σ such that |Σ| = log<sup><i>O</i>(1)</sup> <i>n</i>. These space-economical data structures are useful in processing huge amounts of text data.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[3]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[4]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[5]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Venkatesh Raman,et al.  Succinct representation of balanced parentheses, static trees and planar graphs , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8]  Rasmus Pagh Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time , 1999, ICALP.

[9]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[10]  Erik D. Demaine,et al.  Representing Trees of Higer Degree , 1999, WADS.

[11]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[13]  Kunihiko Sadakane,et al.  Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array , 2000, ISAAC.

[14]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[15]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[16]  Rajeev Raman,et al.  Representing Trees of Higher Degree , 2005, Algorithmica.