An(other) Entropy-Bounded Compressed Suffix Tree

Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. We present a novel compressed suffix tree. Compared to the existing ones, ours is the first achieving at the same time sublogarithmic complexity for the operations, and space usage which goes to zero as the entropy of the text does. Our development contains several novel ideas, such as compressing the longest common prefix information, and totally getting rid of the suffix tree topology, expressing all the suffix tree operations using range minimum queries and a new primitive called next/previous smaller value in a sequence.

[1]  S. Srinivasa Rao,et al.  Space Efficient Suffix Trees , 1998, J. Algorithms.

[2]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[3]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[4]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[5]  Ulrich Meyer,et al.  Algorithms for Memory Hierarchies , 2003, Lecture Notes in Computer Science.

[6]  Mike Paterson,et al.  Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers , 2007, ESCAPE.

[7]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[8]  Uzi Vishkin,et al.  Optimal Doubly Logarithmic Parallel Algorithms Based on Finding All Nearest Smaller Values , 1993, J. Algorithms.

[9]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[10]  Naila Rahman,et al.  Engineering the LOUDS Succinct Tree Representation , 2006, WEA.

[11]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[12]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[13]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[14]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[15]  Moshe Lewenstein,et al.  Suffix Trays and Suffix Trists: Structures for Faster Text Indexing , 2006, ICALP.

[16]  Z. Galil,et al.  Combinatorial Algorithms on Words , 1985 .

[17]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[18]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[19]  Rodrigo González,et al.  Compressed Text Indexes with Fast Locate , 2007, CPM.

[20]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[21]  S. Srinivasa Rao,et al.  Full-Text Indexes in External Memory , 2002, Algorithms for Memory Hierarchies.

[22]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[23]  Srinivas Aluru,et al.  Optimal Self-adjusting Trees for Dynamic String Data in Secondary Storage , 2007, SPIRE.

[24]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[25]  Gonzalo Navarro,et al.  Succinct Suffix Arrays based on Run-Length Encoding , 2005, Nord. J. Comput..

[26]  Naila Rahman,et al.  A simple optimal representation for balanced parentheses , 2006, Theor. Comput. Sci..

[27]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[28]  Volker Heun,et al.  Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing , 2008, IWOCA.

[29]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[30]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .