MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies

Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens’ theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab.

[1]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  J. Mrázek Phylogenetic signals in DNA composition: limitations and prospects. , 2009, Molecular biology and evolution.

[4]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[5]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[6]  R. Edwards,et al.  Viral metagenomics , 2005, Nature Reviews Microbiology.

[7]  R. Edwards,et al.  Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets , 2011, PloS one.

[8]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[9]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[10]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[11]  Lior Pachter,et al.  RESEARCH ARTICLE Open Access Identification and correction of systematic error in high-throughput sequence data , 2022 .

[12]  C. Desnues,et al.  Viral Metagenomics on Animals as a Tool for the Detection of Zoonoses Prior to Human Infection? , 2014, International journal of molecular sciences.

[13]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[14]  M. Beer,et al.  The Origin of Biased Sequence Depth in Sequence-Independent Nucleic Acid Amplification and Optimization for Efficient Massive Parallel Sequencing , 2013, PloS one.

[15]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[16]  Michael P. Cummings,et al.  A comparative evaluation of sequence classification programs , 2012, BMC Bioinformatics.

[17]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[18]  Luis Miguel Rodriguez-Rojas,et al.  Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets , 2014, Bioinform..

[19]  Vincent Lefèvre,et al.  MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[20]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[21]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[22]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[23]  Robert G. Beiko,et al.  Classifying short genomic fragments from novel lineages using composition and homology , 2011, BMC Bioinformatics.

[24]  M. Berg,et al.  The Intestinal Eukaryotic Virome in Healthy and Diarrhoeic Neonatal Piglets , 2016, PloS one.

[25]  J. Derisi,et al.  Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data , 2014, PloS one.

[26]  Monzoorul Haque Mohammed,et al.  Classification of metagenomic sequences: methods and challenges , 2012, Briefings Bioinform..

[27]  Huzefa Rangwala,et al.  Evaluation of short read metagenomic assembly , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[28]  Stephen A. Stanhope,et al.  Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments , 2010, PloS one.

[29]  Katharina J Hoff,et al.  The effect of sequencing errors on metagenomic gene prediction , 2009, BMC Genomics.

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  M. Berg,et al.  New viruses in veterinary medicine, detected by metagenomic approaches. , 2013, Veterinary microbiology.

[32]  Takashi Ishida,et al.  GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics , 2012, PloS one.

[33]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[34]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[35]  Gary Benson,et al.  Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data , 2014, BMC Bioinformatics.

[36]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[37]  Lior Pachter,et al.  Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities , 2005, PLoS Comput. Biol..

[38]  M. Berg,et al.  Metagenomic detection methods in biopreparedness outbreak scenarios. , 2013, Biosecurity and bioterrorism : biodefense strategy, practice, and science.

[39]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[40]  Alison S. Waller,et al.  Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data , 2012, PloS one.

[41]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[42]  Nan Li,et al.  Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. , 2012, Briefings in functional genomics.

[43]  F. Raymond,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ray Meta: scalable de novo metagenome assembly and profiling , 2012 .

[44]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[45]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[46]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[47]  Nicholas A. Bokulich,et al.  Improved Selection of Internal Transcribed Spacer-Specific Primers Enables Quantitative, Ultra-High-Throughput Profiling of Fungal Communities , 2013, Applied and Environmental Microbiology.

[48]  Luis M Rodriguez-R,et al.  Estimating coverage in metagenomic data sets and why it matters , 2014, The ISME Journal.

[49]  J. Sánchez-Vizcaíno,et al.  Metagenomic Detection of Viral Pathogens in Spanish Honeybees: Co-Infection by Aphid Lethal Paralysis, Israel Acute Paralysis and Lake Sinai Viruses , 2013, PloS one.

[50]  Steven H. Hinrichs,et al.  RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles , 2011, BMC Bioinformatics.

[51]  Michael C. Wendl,et al.  Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens’ theorem , 2012, Journal of Mathematical Biology.

[52]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[53]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[54]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[55]  M. Berriman,et al.  Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps , 2010, Genome Biology.

[56]  Monzoorul Haque Mohammed,et al.  ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples , 2011, Bioinformation.

[57]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[58]  F. Rohwer,et al.  Metagenomics and future perspectives in virus discovery , 2012, Current Opinion in Virology.

[59]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[60]  Tracy K. Teal,et al.  Systematic artifacts in metagenomes from complex microbial communities , 2009, The ISME Journal.