The Alternating BWT: an algorithmic perspective

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several area in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in [Gessel et al. 2012] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in [Giancarlo et al. 2018], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm [Karkkainen et al., 2006] with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order.

[1]  Raffaele Giancarlo,et al.  A New Class of Searchable and Provably Highly Compressible String Transformations , 2019, CPM.

[2]  Luca Q. Zamboni,et al.  Clustering Words and Interval Exchanges , 2013 .

[3]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2004, Algorithmica.

[4]  Xiangde Zhang,et al.  The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. , 2010, Journal of theoretical biology.

[5]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .

[6]  Antonio Restivo,et al.  On generalized Lyndon words , 2018, Theor. Comput. Sci..

[7]  Antonio Restivo,et al.  A New Combinatorial Approach to Sequence Comparison , 2005, Theory of Computing Systems.

[8]  Asako Koike,et al.  Ultrafast SNP analysis using the Burrows-Wheeler transform of short-read data , 2015, Bioinform..

[9]  Giovanna Rosone,et al.  Lightweight LCP construction for very large collections of strings , 2016, J. Discrete Algorithms.

[10]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[11]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[12]  Giovanna Rosone,et al.  SNPs detection by eBWT positional clustering , 2019, Algorithms for Molecular Biology.

[13]  M. Schindler,et al.  A fast block-sorting algorithm for lossless data compression , 1997, Proceedings DCC '97. Data Compression Conference.

[14]  Igor Pak,et al.  Long cycles in abc-permutations , 2008 .

[15]  Maxime Crochemore,et al.  A note on the Burrows-Wheeler transformation , 2005, ArXiv.

[16]  Yossi Shiloach,et al.  Fast Canonization of Circular Strings , 1981, J. Algorithms.

[17]  Antonio Restivo,et al.  A bijection between words and multisets of necklaces , 2012, Eur. J. Comb..

[18]  Antonio Restivo,et al.  From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization , 2007, Theor. Comput. Sci..

[19]  Antonio Restivo,et al.  Block Sorting-Based Transformations on Words: Beyond the Magic BWT , 2018, DLT.

[20]  Kellogg S. Booth,et al.  Lexicographically Least Circular Substrings , 1980, Inf. Process. Lett..

[21]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[22]  Raffaele Giancarlo,et al.  Boosting textual compression in optimal linear time , 2005, JACM.

[23]  Antonio Restivo,et al.  Burrows-Wheeler transform and Sturmian words , 2003, Inf. Process. Lett..

[24]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[25]  Giovanna Rosone,et al.  The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words , 2013, CiE.

[26]  Peter M. Fenwick The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements , 1996, Comput. J..

[27]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[28]  Alexandru I. Tomescu,et al.  Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing , 2015 .

[29]  Antonio Restivo,et al.  An extension of the Burrows-Wheeler Transform , 2007, Theor. Comput. Sci..

[30]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[31]  Travis Gagie,et al.  Wheeler graphs: A framework for BWT-based data structures☆ , 2017, Theor. Comput. Sci..

[32]  Stephen R. Tate,et al.  Higher compression from the Burrows-Wheeler transform by modified sorting , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[33]  M. Lothaire,et al.  Applied Combinatorics on Words , 2005 .

[34]  Paolo Ferragina,et al.  On Optimally Partitioning a Text to Improve Its Compression , 2009, Algorithmica.

[35]  Gonzalo Navarro,et al.  Optimal Lower and Upper Bounds for Representing Sequences , 2011, TALG.

[36]  Antonio Restivo,et al.  Measuring the clustering effect of BWT via RLE , 2017, Theor. Comput. Sci..

[37]  Giovanni Manzini,et al.  External memory BWT and LCP computation for sequence collections with applications , 2019, Algorithms for molecular biology : AMB.

[38]  Antonio Restivo,et al.  Burrows-Wheeler Transform and Run-Length Enconding , 2017, WORDS.

[39]  Giovanna Rosone,et al.  Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform , 2012, Bioinform..

[40]  Antonio Restivo,et al.  Balancing and clustering of words in the Burrows-Wheeler transform , 2011, Theor. Comput. Sci..

[41]  Antonio Restivo,et al.  Sorting conjugates and Suffixes of Words in a Multiset , 2014, Int. J. Found. Comput. Sci..

[42]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[43]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[44]  Charles J. Colbourn,et al.  Quorums from difference covers , 2000, Inf. Process. Lett..

[45]  Antonio Restivo,et al.  Burrows-Wheeler transform and palindromic richness , 2009, Theor. Comput. Sci..

[46]  Simon J. Puglisi,et al.  Words with Simple Burrows-Wheeler Transforms , 2008, Electron. J. Comb..

[47]  Paolo Ferragina,et al.  Indexing compressed text , 2005, JACM.

[48]  Antonio Restivo,et al.  Distance measures for biological sequences: Some recent approaches , 2008, Int. J. Approx. Reason..

[49]  Arnaud Lefebvre,et al.  A survey of string orderings and their application to the Burrows-Wheeler transform , 2017, Theor. Comput. Sci..

[50]  Ira M. Gessel,et al.  Counting Permutations with Given Cycle Structure and Descent Set , 1993, J. Comb. Theory A.

[51]  M. Lothaire Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications) , 2005 .

[52]  Jean Pierre Duval,et al.  Factorizing Words over an Ordered Alphabet , 1983, J. Algorithms.