MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.

Approaches in molecular biology, particularly those that deal with high-throughput sequencing of entire microbial communities (the field of metagenomics), are rapidly advancing our understanding of the composition and functional content of microbial communities involved in climate change, environmental pollution, human health, biotechnology, etc. Metagenomics provides researchers with the most complete picture of the taxonomic (i.e., what organisms are there) and functional (i.e., what are those organisms doing) composition of natively sampled microbial communities, making it possible to perform investigations that include organisms that were previously intractable to laboratory-controlled culturing; currently, these constitute the vast majority of all microbes on the planet. All organisms contained in environmental samples are sequenced in a culture-independent manner, most often with 16S ribosomal amplicon methods to investigate the taxonomic or whole-genome shotgun-based methods to investigate the functional content of sampled communities. Metagenomics allows researchers to characterize the community composition and functional content of microbial communities, but it cannot show which functional processes are active; however, near parallel developments in transcriptomics promise a dramatic increase in our knowledge in this area as well. Since 2008, MG-RAST (Meyer et al., BMC Bioinformatics 9:386, 2008) has served as a public resource for annotation and analysis of metagenomic sequence data, providing a repository that currently houses more than 150,000 data sets (containing 60+ tera-base-pairs) with more than 23,000 publically available. MG-RAST, or the metagenomics RAST (rapid annotation using subsystems technology) server makes it possible for users to upload raw metagenomic sequence data in (preferably) fastq or fasta format. Assessments of sequence quality, annotation with respect to multiple reference databases, are performed automatically with minimal input from the user (see Subheading 4 at the end of this chapter for more details). Post-annotation analysis and visualization are also possible, directly through the web interface, or with tools like matR (metagenomic analysis tools for R, covered later in this chapter) that utilize the MG-RAST API ( http://api.metagenomics.anl.gov/api.html ) to easily download data from any stage in the MG-RAST processing pipeline. Over the years, MG-RAST has undergone substantial revisions to keep pace with the dramatic growth in the number, size, and types of sequence data that accompany constantly evolving developments in metagenomics and related -omic sciences (e.g., metatranscriptomics).

[1]  Robert A. Edwards,et al.  PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity , 2012, BMC Bioinformatics.

[2]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[3]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[4]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[5]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[6]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[7]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[8]  Patrick J. Biggs,et al.  SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data , 2010, BMC Bioinformatics.

[9]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[10]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[11]  Andreas Wilke,et al.  Shock: Active Storage for Multicloud Streaming Data Analysis , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[12]  Catherine Brooksbank,et al.  The European Bioinformatics Institute’s data resources , 2009, Nucleic Acids Res..

[13]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[14]  Tracy K. Teal,et al.  Systematic artifacts in metagenomes from complex microbial communities , 2009, The ISME Journal.

[15]  Andreas Wilke,et al.  A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE , 2012, PLoS Comput. Biol..

[16]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[19]  Andreas Wilke,et al.  Using clouds for metagenomics: A case study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[20]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[21]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[22]  Andreas Wilke,et al.  Short-read reading-frame predictors are not created equal: sequence error causes loss of signal , 2012, BMC Bioinformatics.

[23]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[24]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[25]  Li Ni,et al.  The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..

[26]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[27]  G. Cochrane,et al.  The Genomic Standards Consortium , 2011, PLoS biology.

[28]  Andreas Wilke,et al.  A scalable data analysis platform for metagenomics , 2013, 2013 IEEE International Conference on Big Data.

[29]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[30]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[31]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[32]  Deborah Hix,et al.  PATRIC: The VBI PathoSystems Resource Integration Center , 2006, Nucleic Acids Res..

[33]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[34]  Andreas Wilke,et al.  The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools , 2012, BMC Bioinformatics.

[35]  Rob Knight,et al.  The 'rare biosphere': a reality check , 2009, Nature Methods.

[36]  J. John Mann,et al.  Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction. , 2003, BMC Bioinformatics.

[37]  Andreas Wilke,et al.  Skyport - Container-Based Execution Environment Management for Multi-cloud Scientific Workflows , 2014, 2014 5th International Workshop on Data-Intensive Computing in the Clouds.

[38]  Alexander Bolotin,et al.  Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. , 2005, Microbiology.