论文信息 - Fast construction of FM-index for long sequence reads

Fast construction of FM-index for long sequence reads

SUMMARY We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. AVAILABILITY AND IMPLEMENTATION https://github.com/lh3/ropebwt2 CONTACT: hengli@broadinstitute.org.

Heng Li

[1] M. DePristo,et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[2] Giovanna Rosone,et al. Lightweight algorithms for constructing and inverting the BWT of string collections , 2013, Theor. Comput. Sci..

[3] Tak Wah Lam,et al. GPU-Accelerated BWT Construction for Large Collection of Short Reads , 2014, ArXiv.

[4] Timothy B. Stockwell,et al. The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[5] R. Durbin,et al. Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[6] Giovanna Rosone,et al. Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform , 2012, Bioinform..

[7] Wing-Kai Hon,et al. Compressed Index for a Dynamic Collection of Texts , 2004, CPM.