Multivariate Analysis of Functional Metagenomes

Metagenomics is a primary tool for the description of microbial and viral communities. The sheer magnitude of the data generated in each metagenome makes identifying key differences in the function and taxonomy between communities difficult to elucidate. Here we discuss the application of seven different data mining and statistical analyses by comparing and contrasting the metabolic functions of 212 microbial metagenomes within and between 10 environments. Not all approaches are appropriate for all questions, and researchers should decide which approach addresses their questions. This work demonstrated the use of each approach: for example, random forests provided a robust and enlightening description of both the clustering of metagenomes and the metabolic processes that were important in separating microbial communities from different environments. All analyses identified that the presence of phage genes within the microbial community was a predictor of whether the microbial community was host-associated or free-living. Several analyses identified the subtle differences that occur with environments, such as those seen in different regions of the marine environment.

[1]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[2]  N. Moran,et al.  Bacteriophages Encode Factors Required for Protection in a Symbiotic Mutualism , 2009, Science.

[3]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[4]  Forest Rohwer,et al.  Metagenomic analysis of the microbial community associated with the coral Porites astreoides. , 2007, Environmental microbiology.

[5]  Rick L. Stevens,et al.  Connecting genotype to phenotype in the era of high-throughput sequencing. , 2011, Biochimica et biophysica acta.

[6]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[7]  Werner Ceusters,et al.  Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain , 2006, KR-MED.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[10]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[13]  Leo Breiman,et al.  Technical note: Some properties of splitting criteria , 2004, Machine Learning.

[14]  Forest Rohwer,et al.  Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Ciénegas, Mexico. , 2009, Environmental microbiology.

[15]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[16]  Florent E. Angly,et al.  The Marine Viromes of Four Oceanic Regions , 2006, PLoS biology.

[17]  W. Atchley,et al.  THE GEOMETRY OF CANONICAL VARIATE ANALYSIS , 1981 .

[18]  A. Ramette Multivariate analyses in microbial ecology , 2007, FEMS microbiology ecology.

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Robert Olson,et al.  Real Time Metagenomics: Using k-mers to annotate metagenomes , 2012, Bioinform..

[21]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[22]  Forest Rohwer,et al.  An application of statistics to comparative metagenomics , 2006, BMC Bioinformatics.

[23]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[24]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[25]  Robert G. Beiko,et al.  Identifying biologically relevant differences between metagenomic communities , 2010, Bioinform..

[26]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[27]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[28]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[29]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[30]  Florent E. Angly,et al.  Microbial Ecology of Four Coral Atolls in the Northern Line Islands , 2008, PloS one.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  K. Nelson,et al.  Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases , 2009, Proceedings of the National Academy of Sciences.

[33]  David S. Wishart,et al.  METAGENassist: a comprehensive web server for comparative metagenomics , 2012, Nucleic Acids Res..

[34]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[35]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[36]  D. Willner,et al.  Metagenomic signatures of 86 microbial and viral metagenomes. , 2009, Environmental microbiology.

[37]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[38]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .