MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data

MOTIVATION High-throughput metagenomic sequencing has revolutionized our view on the structure and metabolic potential of microbial communities. However, analysis of metagenomic composition is often complicated by the high complexity of the community and the lack of related reference genomic sequences. As a start point for comparative metagenomic analysis, the researchers require efficient means for assessing pairwise similarity of the metagenomes (beta-diversity). A number of approaches were used to address this task, however, most of them have inherent disadvantages that limit their scope of applicability. For instance, the reference-based methods poorly perform on metagenomes from previously unstudied niches, while composition-based methods appear to be too abstract for straightforward interpretation and do not allow to identify the differentially abundant features. RESULTS We developed MetaFast, an approach that allows to represent a shotgun metagenome from an arbitrary environment as a modified de Bruijn graph consisting of simplified components. For multiple metagenomes, the resulting representation is used to obtain a pairwise similarity matrix. The dimensional structure of the metagenomic components preserved in our algorithm reflects the inherent subspecies-level diversity of microbiota. The method is computationally efficient and especially promising for an analysis of metagenomes from novel environmental niches. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available for download at https://github.com/ctlab/metafast The code is written in Java and is platform independent (tested on Linux and Windows x86_64). CONTACT ulyantsev@rain.ifmo.ru SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Robert D. Finn,et al.  EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data , 2015, Nucleic Acids Res..

[2]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[3]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[4]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[5]  Marcel J. T. Reinders,et al.  Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold , 2013, Bioinform..

[6]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[7]  M. Mohiuddin,et al.  Spatial and temporal dynamics of virus occurrence in two freshwater lakes captured through metagenomic analysis , 2015, Front. Microbiol..

[8]  S. Rampelli,et al.  Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota , 2015, Current Biology.

[9]  Christina Warinner,et al.  Gut Microbiome Diversity among Cheyenne and Arapaho Individuals from Western Oklahoma , 2015, Current Biology.

[10]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[11]  F. Rohwer,et al.  Metagenomics and future perspectives in virus discovery , 2012, Current Opinion in Virology.

[12]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[13]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics , 2015, Cell systems.

[14]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[15]  Peter Salamon,et al.  Reference-independent comparative metagenomics using cross-assembly: crAss , 2012, Bioinform..

[16]  Huzefa Rangwala,et al.  Metagenomic Taxonomic Classification Using Extreme Learning Machines , 2012, J. Bioinform. Comput. Biol..

[17]  F. Raymond,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ray Meta: scalable de novo metagenome assembly and profiling , 2012 .

[18]  Daniel Standage,et al.  The khmer software package: enabling efficient nucleotide sequence analysis , 2015, F1000Research.

[19]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[20]  Yu-Wei Wu,et al.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.

[21]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[22]  Dmitry S. Ischenko,et al.  Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis , 2016, BMC Bioinformatics.

[23]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[24]  Dmitry G. Alexeev,et al.  Bacterial rose garden for metagenomic SNP-based phylogeny visualization , 2015, BioData Mining.

[25]  D. Aguirre de Cárcer,et al.  Biodiversity and distribution of polar freshwater DNA viruses , 2015, Science Advances.

[26]  Blake A. Simmons,et al.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets , 2016, Bioinform..

[27]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[28]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[29]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[30]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[31]  Marcel Huntemann,et al.  Metagenomes from two microbial consortia associated with Santa Barbara seep oil. , 2014, Marine genomics.

[32]  Brian C. Thomas,et al.  New Approaches Indicate Constant Viral Diversity despite Shifts in Assemblage Structure in an Australian Hypersaline Lake , 2013, Applied and Environmental Microbiology.

[33]  Andreas Wilke,et al.  The MG-RAST metagenomics database and portal in 2015 , 2015, Nucleic Acids Res..

[34]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[35]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[36]  Konstantinos Krampis,et al.  Census-based rapid and accurate metagenome taxonomic profiling , 2014, BMC Genomics.

[37]  Zhaojun Bai,et al.  CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads , 2007, RECOMB.

[38]  R. Edwards,et al.  A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes , 2014, Nature Communications.

[39]  Robert Olson,et al.  Real Time Metagenomics: Using k-mers to annotate metagenomes , 2012, Bioinform..

[40]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[41]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[42]  Arwyn Edwards,et al.  A metagenomic snapshot of taxonomic and functional diversity in an alpine glacier cryoconite ecosystem , 2013 .

[43]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[44]  Yeisoo Yu,et al.  Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing , 2015, BMC Genomics.

[45]  Siu-Ming Yiu,et al.  MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample , 2012, Bioinform..

[46]  Kai Song,et al.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing , 2014, Briefings Bioinform..

[47]  Dmitry G. Alexeev,et al.  Human gut microbiota community structures in urban and rural populations in Russia , 2013, Nature Communications.

[48]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[49]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.