论文信息 - deSAMBA: fast and accurate classification of metagenomics long reads with sparse approximate matches

deSAMBA: fast and accurate classification of metagenomics long reads with sparse approximate matches

Summary Long read sequencing technologies are promising to metagenomics studies. However, there is still lack of read classification tools to fast and accurately identify the taxonomies of noisy long reads, which is a bottleneck to the use of long read sequencing. Herein, we propose deSAMBA, a tailored long read classification approach that uses a novel sparse approximate match block (SAMB)-based pseudo alignment algorithm. Benchmarks on real datasets demonstrate that deSAMBA enables to simultaneously achieve fast speed and good classification yields, which outperforms state-of-the-art tools and has many potentials to cutting-edge metagenomics studies. Availability and Implementation https://github.com/hitbc/deSAMBA. Supplementary information:

[1] S. Salzberg,et al. Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[2] Heng Li,et al. Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[3] David A. Matthews,et al. Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[4] Anders Krogh,et al. Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[5] Ye Yu,et al. A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures , 2017, Bioinform..

[6] Derrick E. Wood,et al. Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[7] Yadong Wang,et al. deSPI: efficient classification of metagenomic reads with lightweight de Bruijn graph-based reference indexing , 2016, bioRxiv.