Improving Lookup Time Complexity of Compressed Suffix Arrays using Multi-ary Wavelet Tree

In a given text T of size n, we need to search for the information that we are interested. In order to support fast searching, an index must be constructed by preprocessing the text. Suffix array is a kind of index data structure. The compressed suffix array (CSA) is one of the compressed indices based on the regularity of the suffix array, and can be compressed to the k th order empirical entropy. In this paper we improve the lookup time complexity of the compressed suffix array by using the multi-ary wavelet tree at the cost of more space. In our implementation, the lookup time complexity of the compressed suffix array is O(log e/(1-e) σ n log r σ), and the space of the compressed suffix array is e?¹ nH k (T) + O(n log log n/log e σ n) bits, where σ is the size of alphabet, Hk is the kth order empirical entropy, r is the branching factor of the multi-ary wavelet tree such that 2 ≤ r ≤ √n and r ≤ O(log 1-e σ n), and 0 < e < 1/2 is a constant.

[1]  Rasmus Pagh Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time , 1999, ICALP.

[2]  Veli Mäkinen Compact Suffix Array - A Space-Efficient Full-Text Index , 2003, Fundam. Informaticae.

[3]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[4]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[5]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[6]  Byung Suk Lee,et al.  Transformation of Continuous Aggregation Join Queries over Data Streams , 2007, J. Comput. Sci. Eng..

[7]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[8]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[9]  D. Eppstein Foreword to special issue on SODA 2002 , 2007, TALG.

[10]  Rasmus Pagh,et al.  Low redundancy in dictionaries with O(1) worst case lookup time , 1998 .

[11]  Wei Zhang,et al.  Optimizing Instruction Prefetching to Improve Worst-Case Performance for Real-Time Applications , 2009, J. Comput. Sci. Eng..

[12]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[13]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[14]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[15]  Gonzalo Navarro,et al.  Compressed Compact Suffix Arrays , 2004, CPM.

[16]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[17]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.