Compressed Suffix Trees with Full Functionality

AbstractWe introduce new data structures for compressed suffix trees whose size are linear in the text size. The size is measured in bits; thus they occupy only O(n log|A|) bits for a text of length n on an alphabet A. This is a remarkable improvement on current suffix trees which require O(n log n) bits. Though some components of suffix trees have been compressed, there is no linear-size data structure for suffix trees with full functionality such as computing suffix links, string-depths and lowest common ancestors. The data structure proposed in this paper is the first one that has linear size and supports all operations efficiently. Any algorithm running on a suffix tree can also be executed on our compressed suffix trees with a slight slowdown of a factor of polylog(n).

[1]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[2]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[3]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[4]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[5]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[6]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[7]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[8]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[9]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[10]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999, Softw. Pract. Exp..

[11]  Faith Ellen,et al.  Optimal bounds for the predecessor problem , 1999, STOC '99.

[12]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[14]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[15]  J. Ian Munro,et al.  Succinct Representation of Balanced Parentheses and Static Trees , 2002, SIAM J. Comput..

[16]  Giovanni Manzini,et al.  An experimental study of an opportunistic index , 2001, SODA '01.

[17]  S. Srinivasa Rao,et al.  Space Efficient Suffix Trees , 1998, J. Algorithms.

[18]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[19]  Siu-Ming Yiu,et al.  A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays , 2002, COCOON.

[20]  Giovanni Manzini,et al.  On compressing and indexing data , 2002 .

[21]  Wing-Kai Hon,et al.  Constructing Compressed Suffix Arrays with Large Alphabets , 2003, ISAAC.

[22]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[23]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[24]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[25]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .