Relative FM-Indexes

Intuitively, if two strings S 1 and S 2 are sufficiently similar and we already have an FM-index for S 1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S 2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems.

[1]  Gonzalo Navarro,et al.  Storage and Retrieval of Highly Repetitive Sequence Collections , 2010, J. Comput. Biol..

[2]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[3]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[4]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[5]  Hector Ferrada,et al.  Hybrid indexes for repetitive datasets , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[7]  Gad M. Landau,et al.  An efficient string matching algorithm with k differences for nucleotide and amino acid sequences , 2018, Nucleic Acids Res..

[8]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[9]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[10]  Prosenjit Bose,et al.  Pattern Matching for Permutations , 1993, WADS.

[11]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[12]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[13]  M. Gerstein,et al.  AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[14]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[15]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[16]  Juha Kärkkäinen,et al.  Hybrid Compression of Bitvectors for the FM-Index , 2014, 2014 Data Compression Conference.