Faster semi-external suffix sorting

Suffix array (SA) construction is a time-and-memory bottleneck in many string processing applications. In this paper we improve the runtime of a small-space - semi-external - SA construction algorithm by Karkkainen (TCS, 2007) 5]. We achieve a speedup in practice of 2-4 times, without increasing memory usage. Our main contribution is a way to implement the "pointer copying" heuristic, used in less space-efficient SA construction algorithms, in a memory-efficient way. We improve the practical performance of a suffix sorting algorithm due to Karkkainen.We adapt powerful heuristics from large memory to the semi-external setting.We show that the new algorithm has relevant space-time tradeoffs in practice.

[1]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[2]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[3]  Juha Kärkkäinen,et al.  Fast BWT in small space by blockwise suffix sorting , 2007, Theor. Comput. Sci..

[4]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[5]  Johann van der Merwe,et al.  A survey on peer-to-peer key management for mobile ad hoc networks , 2007, CSUR.

[6]  Simon J. Puglisi,et al.  An efficient, versatile approach to suffix sorting , 2008, JEAL.

[7]  Travis Gagie,et al.  Lightweight Data Indexing and Compression in External Memory , 2009, Algorithmica.

[8]  Juha Kärkkäinen,et al.  Fixed Block Compression Boosting in FM-Indexes , 2011, SPIRE.

[9]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[10]  Gang Chen,et al.  Fast and Practical Algorithms for Computing All the Runs in a String , 2007, CPM.

[11]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[12]  Hozumi Tanaka,et al.  An efficient method for in memory construction of suffix arrays , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[13]  Simon J. Puglisi,et al.  Space-Time Tradeoffs for Longest-Common-Prefix Array Computation , 2008, ISAAC.

[14]  Julian Seward On the performance of BWT sorting algorithms , 2000, Proceedings DCC 2000. Data Compression Conference.

[15]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.