An incomplex algorithm for fast suffix array construction

The suffix array of a string is a permutation of all starting positions of the string's suffixes that are lexicographically sorted. We present a practical algorithm for suffix array construction that consists of two easy-to-implement components. First it sorts the suffixes with respect to a fixed length prefix; then it refines each bucket of suffixes sharing the same prefix using the order of already sorted suffixes. Other suffix array construction algorithms follow more complex strategies. Moreover, we achieve a very fast construction for common strings as well as for worst case strings by enhancing our algorithm with further techniques. Copyright © 2006 John Wiley & Sons, Ltd.

[1]  Kunihiko Sadakane,et al.  Faster suffix sorting , 2007, Theoretical Computer Science.

[2]  Dong Kyue Kim,et al.  Constructing suffix arrays in linear time , 2005, J. Discrete Algorithms.

[3]  Hozumi Tanaka,et al.  An efficient method for in memory construction of suffix arrays , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[4]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[5]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[6]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[7]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[8]  Richard C. Singleton Algorithm 347: an efficient algorithm for sorting with minimal storage [M1] , 1969, CACM.

[9]  Dong Kyue Kim,et al.  A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets , 2004, WEA.

[10]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[11]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[12]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2004, Algorithmica.

[13]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[14]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[15]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[16]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[17]  Wing-Kai Hon,et al.  Breaking a time-and-space barrier in constructing full-text indices , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[18]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[19]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[20]  Keith Bostic,et al.  Engineering Radix Sort , 1993, Comput. Syst..

[21]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.