Space Efficient Suffix Trees

We first give a representation of a suffix tree that uses \(n \lg n + O(n)\) bits of space and supports searching for a pattern in the given text (from a fixed size alphabet) in O(m) time, where n is the size of the text and m is the size of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan in [17]. Previous compact representations of suffix trees had a higher lower order term in space and had some expectation assumption [3], or required more time for searching [5]. Then, surprisingly, we show that we can even do better, by developing a structure that uses a suffix array (and so \(n \lceil \lg n \rceil \) bits) and an additional o(n) bits. String searching can be done in this structure also in O(m) time. Besides supporting string searching, we can also report the number of occurrences of the pattern in the same time using no additional space. In this case the space occupied by the structures is much less compared to many of the previously known structures to do this. When the size of the alphabet k is not a constant, our structures can be easily extended, using standard tricks, to those that use the same space but take \(O(m \lg k)\) time for string searching or to those that use an additional \(O(m \lg k)\) bits but take the same O(m) time for searching.

[1]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[2]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[3]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[4]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[5]  Franco P. Preparata,et al.  Structural Properties of the String Statistics Problem , 1985, J. Comput. Syst. Sci..

[6]  Heping Shang Trie Methods for Text and Spatial Data on Secondary Storage , 1994 .

[7]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[8]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[9]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[10]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[11]  Venkatesh Raman,et al.  Succinct representation of balanced parentheses, static trees and planar graphs , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[12]  Livio Colussi,et al.  A Time and Space Efficient Data Structure for String Searching on Large Texts , 1996, Inf. Process. Lett..

[13]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.