deSAMBA: fast and accurate classification of metagenomics long reads with sparse approximate matches

Summary Long read sequencing technologies are promising to metagenomics studies. However, there is still lack of read classification tools to fast and accurately identify the taxonomies of noisy long reads, which is a bottleneck to the use of long read sequencing. Herein, we propose deSAMBA, a tailored long read classification approach that uses a novel sparse approximate match block (SAMB)-based pseudo alignment algorithm. Benchmarks on real datasets demonstrate that deSAMBA enables to simultaneously achieve fast speed and good classification yields, which outperforms state-of-the-art tools and has many potentials to cutting-edge metagenomics studies. Availability and Implementation https://github.com/hitbc/deSAMBA. Supplementary information: