An Efficient Algorithm for Suffix Sorting

The Suffix Array (SA) is a fundamental data structure which is widely used in the applications such as string matching, text index and computation biology, etc. How to sort the suffixes of a string in lexicographical order is a primary problem in constructing SAs, and one of the widely used suffix sorting algorithms is qsufsort. However, qsufsort suffers one critical limitation that the order of suffixes starting with the same 2k characters cannot be determined in the kth round. To this point, in our paper, an efficient suffix sorting algorithm called dsufsort is proposed by overcoming the drawback of the qsufsort algorithm. In particular, our proposal maintains the depth of each unsorted portion of SA, and sorts the suffixes based on the depth in each round. By this means, some suffixes that cannot be sorted by qsufsort in each round can be sorted now, as a result, more sorting results in current round can be utilized by the latter rounds and the total number of sorting rounds will be reduced, which means dsufsort is more efficient than qsufsort. The experimental results show the effectiveness of the proposed algorithm, especially for the text with high repetitions.

[1]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[2]  Dong Kyue Kim,et al.  Constructing suffix arrays in linear time , 2005, J. Discrete Algorithms.

[3]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2002, ESA.

[4]  Simon J. Puglisi,et al.  Trends in Su x Sorting: A Survey of Low Memory Algorithms , 2012, ACSC.

[5]  Sen Zhang,et al.  Suffix Array Construction in External Memory Using D-Critical Substrings , 2014, TOIS.

[6]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[7]  Jens Stoye,et al.  An incomplex algorithm for fast suffix array construction , 2007 .

[8]  Andrew Turpin,et al.  A Taxonomy of SuÆx Array Constru tion Algorithms , 2015 .

[9]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[10]  Kunihiko Sadakane,et al.  Faster suffix sorting , 2007, Theoretical Computer Science.

[11]  Hozumi Tanaka,et al.  An efficient method for in memory construction of suffix arrays , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[12]  N. Jesper Larsson,et al.  Faster su x sorting , 1999 .

[13]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[14]  Sanguthevar Rajasekaran,et al.  An elegant algorithm for the construction of suffix arrays , 2013, J. Discrete Algorithms.

[15]  Ge Nong,et al.  Practical linear-time O(1)-workspace suffix sorting for constant alphabets , 2013, TOIS.

[16]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[17]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2004, Algorithmica.

[18]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[19]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[20]  Juha Kärkkäinen,et al.  Engineering a Lightweight External Memory Suffix Array Construction Algorithm , 2017, ICABD.

[21]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[22]  Yi Wu,et al.  Induced Sorting Suffixes in External Memory , 2015, TOIS.

[23]  Sen Zhang,et al.  Two Efficient Algorithms for Linear Time Suffix Array Construction , 2011, IEEE Transactions on Computers.