A computational framework for integrative analysis of large microbial genomics data

The availability of huge amount of genome sequence data from natural microbial consortia enables integrated analysis to resolve the genetic and metabolic potential of microbial communities, to establish how functions are partitioned in and among populations, and to reveal how microbial communities evolve and adapt across multiple environments. In this paper, we propose to analyze comparative microbial genomes using a computational framework. The framework is designed to investigate genome context patterns of microbial diversity. With an application to investigate functions of three environments (human gut, soil, and marine), we demonstrated that the developed computational framework was able to identify functional modules and evaluate the functional roles of those modules in microbial communities as response to environmental change. We found different gene networks among microbial communities living in different environments. We showed that modules identified by our framework can be computationally annotated to study their biological functions.

[1]  R. Knight,et al.  Advancing analytical algorithms and pipelines for billions of microbial sequences. , 2012, Current opinion in biotechnology.

[2]  Shibu Yooseph,et al.  Genomic and functional adaptation in surface ocean planktonic prokaryotes , 2010, Nature.

[3]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[4]  Wei Zhang,et al.  Mapping genomic features to functional traits through microbial whole genome sequences , 2014, Int. J. Bioinform. Res. Appl..

[5]  P. Bork,et al.  Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data , 2011, Molecular systems biology.

[6]  Albert J. Vilella,et al.  Joining forces in the quest for orthologs , 2009, Genome Biology.

[7]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[8]  R. Knight,et al.  Bacterial Community Variation in Human Body Habitats Across Space and Time , 2009, Science.

[9]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[10]  Roded Sharan,et al.  The large-scale organization of the bacterial network of ecological co-occurrence interactions , 2010, Nucleic acids research.

[11]  W. Whitman,et al.  Prokaryotes: the unseen majority. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[13]  C. Landry,et al.  Ecological annotation of genes and genomes through ecological genomics , 2007, Molecular ecology.

[14]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[15]  Noah Fierer,et al.  Using network analysis to explore co-occurrence patterns in soil microbial communities , 2011, The ISME Journal.

[16]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[17]  Wei Zhang,et al.  A machine learning framework for trait based genomics , 2012, 2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS).

[18]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[19]  Jan O. Korbel,et al.  Quantifying environmental adaptation of metabolic pathways in metagenomics , 2009, Proceedings of the National Academy of Sciences.

[20]  Hubert Rehrauer,et al.  A global network of coexisting microbes from environmental and whole-genome sequence data. , 2010, Genome research.

[21]  W. Sloan,et al.  What is the extent of prokaryotic diversity? , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  Peer Bork,et al.  iPath2.0: interactive pathway explorer , 2011, Nucleic Acids Res..

[23]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[24]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[25]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[26]  Mark L. Blaxter,et al.  annot8r: GO, EC and KEGG annotation of EST datasets , 2008, BMC Bioinformatics.

[27]  Sallie W. Chisholm,et al.  Emergent Biogeography of Microbial Communities in a Model Ocean , 2007, Science.

[28]  Christian R Landry,et al.  What is needed for next-generation ecological and evolutionary genomics? , 2012, Trends in ecology & evolution.

[29]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..