Efficient Algorithms for the Inverse Sort Transform

As an important variant of the Burrows-Wheeler Transform (BWT), the Sort Transform (ST) can speed up the transformation by sorting only a portion of the matrix. However, because the currently known inverse ST algorithms need to retrieve the complete k-order contexts and use hash tables, they are less efficient than the inverse BWT. In this paper, we propose three fast and memory-efficient inverse ST algorithms. The first algorithm uses two auxiliary vectors to replace the hash tables. The algorithm achieves O(kN) time and space complexities for a text of N characters under the context order k. The second uses two additional compact "alternate vectors" to further eliminate the need to restore all of the k-order contexts and achieve O(N) space complexity. Moreover, the third uses a "doubling technique" to further reduce the time complexity to O(N log2 k). The hallmark of these three algorithms is that they can invert the ST in a manner similar to inverting BWT in that they all make use of precalculated auxiliary mapping vectors and require no hash tables. These unifying algorithms can also better explain the connection between the BWT and the ST: Not only can their forward components be performed by the same algorithm framework, but their respective inverse components can also be efficiently conducted by the unifying algorithm framework proposed in the present work.

[1]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[2]  Sen Zhang,et al.  An Efficient Algorithm For The Inverse ST Problem , 2007, 2007 Data Compression Conference (DCC'07).

[3]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[4]  Michael Schindler,et al.  Image Compression Using Blocksort , 2001, Data Compression Conference.

[5]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[6]  Jürgen Becker,et al.  Prototyping of efficient hardware algorithms for data compression in future communication systems , 2001, Proceedings 12th International Workshop on Rapid System Prototyping. RSP 2001.

[7]  Ziya Arnavut Generalization of the BWT transformation and inversion ranks , 2002, Proceedings DCC 2002. Data Compression Conference.

[8]  Ziya Arnavut,et al.  Investigation of block-sorting of multiset permutations , 2004, Int. J. Comput. Math..

[9]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[10]  Sen Zhang,et al.  Unifying the Burrows-Wheeler and the Schindler transforms , 2006, Data Compression Conference (DCC'06).

[11]  Yoram Bresler,et al.  Antisequential suffix sorting for BWT-based data compression , 2005, IEEE Transactions on Computers.

[12]  Bernhard Balkenhol,et al.  Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice , 2000, IEEE Trans. Computers.

[13]  Kunihiko Sadakane,et al.  A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[14]  Giovanni Manzini,et al.  Invited Lecture: The Burrows-Wheeler Transform: Theory and Practice , 1999, MFCS.

[15]  Hidetoshi Yokoo Notes on Block-Sorting Data Compression , 1999 .

[16]  Ziya Amavut LOSSLESS AND NEAR-LOSSLESS COMPRESSION OF ECG SIGNALS , 2001 .

[17]  Giovanni Manzini,et al.  The Burrows-Wheeler Transform : Theory and Practice , 1999 .

[18]  N. Jesper Larsson,et al.  The context trees of block sorting compression , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[19]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[20]  William F. Smyth,et al.  The performance of linear time suffix sorting algorithms , 2005, Data Compression Conference.

[21]  Yong Zhang,et al.  DNA sequence compression using the Burrows-Wheeler Transform , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[22]  Dong Kyue Kim,et al.  Linear-Time Construction of Suffix Arrays , 2003, CPM.

[23]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[24]  Ziya Arnavut,et al.  Lossless compression of color-mapped images , 1999 .

[25]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[26]  Bernhard Balkenhol,et al.  Modifications of the Burrows and Wheeler data compression algorithm , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[27]  Sebastian Deorowicz,et al.  Second step algorithms in the Burrows–Wheeler compression algorithm , 2002, Softw. Pract. Exp..

[28]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .