Metagenomic analysis: the challenge of the data bonanza

Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.

[1]  Zoran Nikoloski,et al.  Dynamic regulatory on/off minimization for biological systems under internal temporal perturbations , 2012, BMC Systems Biology.

[2]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[3]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[4]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[5]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[6]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[7]  Jian Xu,et al.  Parallel-META: efficient metagenomic data analysis based on high-performance computation , 2012, BMC Systems Biology.

[8]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[9]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[10]  Christine A. Orengo,et al.  Gene3D: merging structure and function for a Thousand genomes , 2009, Nucleic Acids Res..

[11]  Damià Barceló,et al.  Sampling of water, soil and sediment to trace organic pollutants at a river-basin scale , 2006, Analytical and bioanalytical chemistry.

[12]  Jing Chen,et al.  Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource , 2010, Nucleic Acids Res..

[13]  Susan M. Huse,et al.  The Taxonomic and Functional Diversity of Microbes at a Temperate Coastal Site: A ‘Multi-Omic’ Study of Seasonal and Diel Temporal Variation , 2010, PloS one.

[14]  Wolfgang Gerlach,et al.  WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads , 2009, BMC Bioinformatics.

[15]  Sitao Wu,et al.  WebMGA: a customizable web server for fast metagenomic sequence analysis , 2011, BMC Genomics.

[16]  Fabian Schreiber,et al.  CoMet—a web server for comparative functional profiling of metagenomes , 2011, Nucleic Acids Res..

[17]  Andreas Wilke,et al.  Using clouds for metagenomics: A case study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[18]  Samuel V. Angiuoli,et al.  Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing , 2011, PloS one.

[19]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[20]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[21]  J. Gilbert,et al.  Microbial metagenomics: beyond the genome. , 2011, Annual review of marine science.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  Meng Zhang,et al.  The next-generation sequencing technology: A technology review and future perspective , 2010, Science China Life Sciences.

[24]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[26]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[27]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[28]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[29]  G. Cochrane,et al.  The Genomic Standards Consortium , 2011, PLoS biology.

[30]  Vijay Mahajan,et al.  Extensions and Refinements , 1985 .

[31]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.