Fast construction of FM-index for long sequence reads

SUMMARY We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. AVAILABILITY AND IMPLEMENTATION https://github.com/lh3/ropebwt2 CONTACT: hengli@broadinstitute.org.