A Compressed Enhanced Suffix Array Supporting Fast String Matching

Index structures like the suffix tree or the suffix array are of utmost importance in stringology, most notably in exact string matching. In the last decade, research on compressed index structures has flourished because the main problem in many applications is the space consumption of the index. It is possible to simulate the matching of a pattern against a suffix tree on an enhanced suffix array by using range minimum queries or the so-called child table . In this paper, we show that the Super-Cartesian tree of the LCP-array (with which the suffix array is enhanced) very naturally explains the child table. More important, however, is the fact that the balanced parentheses representation of this tree constitutes a very natural compressed form of the child table which admits to locate all occ occurrences of pattern P of length m in O (m log|Σ| + occ ) time, where Σ is the underlying alphabet. Our compressed child table uses less space than previous solutions to the problem. An implementation is available.

[1]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[2]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[3]  Pang Ko,et al.  Linear Time Construction of Suffix Arrays , 2002 .

[4]  Wolfgang Gerlach,et al.  Engineering a compressed suffix tree implementation , 2007, JEAL.

[5]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[6]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[7]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[8]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[9]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[10]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[11]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[12]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[13]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[14]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[15]  Rajeev Raman,et al.  Succinct ordinal trees with level-ancestor queries , 2004, SODA '04.

[16]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[17]  Johann van der Merwe,et al.  A survey on peer-to-peer key management for mobile ad hoc networks , 2007, CSUR.

[18]  David Richard Clark,et al.  Compact pat trees , 1998 .

[19]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[20]  J. Ian Munro,et al.  Succinct Representation of Balanced Parentheses and Static Trees , 2002, SIAM J. Comput..

[21]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[22]  Mike Paterson,et al.  Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers , 2007, ESCAPE.

[23]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[24]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[25]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[26]  Gonzalo Navarro,et al.  An(other) Entropy-Bounded Compressed Suffix Tree , 2008, CPM.

[27]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[28]  Volker Heun,et al.  Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing , 2008, IWOCA.

[29]  Z. Galil,et al.  Combinatorial Algorithms on Words , 1985 .

[30]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[31]  Dong Kyue Kim,et al.  A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space , 2005, CPM.

[32]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[33]  Dong Kyue Kim,et al.  An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size , 2004, SPIRE.