Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  P. Hawkey,et al.  Ecology of Haemophilus influenzae and Haemophilus parainfluenzae in sputum and saliva and effects of antibiotics on their distribution in patients with lower respiratory tract infections , 1993, Antimicrobial Agents and Chemotherapy.

[3]  J. Karjalainen,et al.  Transmission of oral Prevotella melaninogenica between a mother and her young child. , 1994, Oral microbiology and immunology.

[4]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[5]  Michael Roberts,et al.  Reducing storage requirements for biological sequence comparison , 2004, Bioinform..

[6]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[7]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[8]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[9]  Gail L. Rosen,et al.  Metagenome Fragment Classification Using N-Mer Frequency Profiles , 2008, Adv. Bioinformatics.

[10]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[11]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[12]  M. Pop,et al.  Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences , 2011, BMC Genomics.

[13]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[14]  S. Salzberg,et al.  PhymmBL expanded: confidence scores, custom databases, parallelization and more , 2011, Nature Methods.

[15]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[16]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[17]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[18]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[19]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[20]  G. Svensäter,et al.  Salivary proteins promote proteolytic activity in Streptococcus mitis biovar 2 and Streptococcus mutans. , 2012, Molecular oral microbiology.

[21]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[22]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[23]  Steven Salzberg,et al.  GAGE-B: an evaluation of genome assemblers for bacterial organisms , 2013, Bioinform..