Persistency in Suffix Trees with Applications to String Interval Problems

The suffix tree has proven to be an invaluable indexing data structure, which is widely used as a building block in many applications. We study the problem of making a suffix tree persistent. Specifically, consider a streamed text T where characters are prepended to the beginning of the text. The suffix tree is updated for each character prepended. We wish to allow access to any previous version of the suffix tree. While it is possible to support basic persistence for suffix trees using classical persistence techniques, some applications which can make use of this persistency cannot be solved efficiently using these techniques alone. A collection of such problems is that of queries on string intervals of the text indexed by the suffix tree. In other words, if the text T = t1...tn is indexed, one may want to answer different queries on string intervals, ti...tj, of the text. These types of problems are known as position-restricted and contain querying, reporting, rank, selection etc. Persistency can be utilized to obtain solutions for these problems on prefixes of the text, by solving these problems on previous versions of the suffix tree. However, for substrings it is not sufficient to use the standard persistency. We propose more sophisticated persistent techniques which yield solutions for position-restricted querying, reporting, rank, and selection problems.

[1]  Haim Kaplan Persistent Data Structures , 2004 .

[2]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[3]  Moshe Lewenstein,et al.  Towards Real-Time Suffix Tree Construction , 2005, SPIRE.

[4]  Gerth Stølting Brodal,et al.  Partially Persistent Data Structures of Bounded Degree with Constant Update Time , 1994, Nord. J. Comput..

[5]  Wing-Kai Hon,et al.  Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing , 2008, Data Compression Conference (dcc 2008).

[6]  Robert E. Tarjan,et al.  Making Data Structures Persistent , 1989, J. Comput. Syst. Sci..

[7]  Philip Bille,et al.  Substring Range Reporting , 2011, CPM.

[8]  Paul F. Dietz Fully Persistent Arrays (Extended Array) , 1989, WADS.

[9]  Amihood Amir,et al.  Real-time indexing over fixed finite alphabets , 2008, SODA '08.

[10]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[11]  Mihai Patrascu Lower bounds for 2-dimensional range counting , 2007, STOC '07.

[12]  Stephen Alstrup,et al.  New data structures for orthogonal range searching , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Joseph JáJá,et al.  Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting , 2004, ISAAC.

[15]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[16]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[17]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[18]  Allan Grønlund Jørgensen,et al.  Data Structures for Range Median Queries , 2009, ISAAC.