An incomplex algorithm for fast suffix array construction

The suffix array of a string is a permutation of all starting positions of the string's suffixes that are lexicographically sorted. We present a practical algorithm for suffix array construction that consists of two easy‐to‐implement components. First it sorts the suffixes with respect to a fixed length prefix; then it refines each bucket of suffixes sharing the same prefix using the order of already sorted suffixes. Other suffix array construction algorithms follow more complex strategies. Moreover, we achieve a very fast construction for common strings as well as for worst case strings by enhancing our algorithm with further techniques. Copyright © 2006 John Wiley & Sons, Ltd.

[1]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[2]  Dong Kyue Kim,et al.  Linear-Time Construction of Suffix Arrays , 2003, CPM.

[3]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[4]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[5]  Richard C. Singleton Algorithm 347: an efficient algorithm for sorting with minimal storage [M1] , 1969, CACM.

[6]  Dong Kyue Kim,et al.  A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets , 2004, WEA.

[7]  Hozumi Tanaka,et al.  An efficient method for in memory construction of suffix arrays , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[8]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[9]  Richard Peto Remark on algorithm 347: An efficient algorithm for sorting with minimal storage , 1970, CACM.

[10]  Keith Bostic,et al.  Engineering Radix Sort , 1993, Comput. Syst..

[11]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[12]  Kunihiko Sadakane,et al.  Faster suffix sorting , 2007, Theoretical Computer Science.

[13]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[14]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[15]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2002, ESA.

[16]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[17]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[18]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[19]  Dong Kyue Kim,et al.  Constructing suffix arrays in linear time , 2005, J. Discrete Algorithms.

[20]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .