Large memory high performance computing enables comparison across human gut microbiome of patients with autoimmune diseases and healthy subjects

Microbial communities that live on the outside and inside of the human body dramatically influence human health and diseases. In recent years, major progress has been made in understanding the human microbiome communities through projects such as the Human Microbiome Project (http://commonfund.nih.gov/hmp/), using next generation sequencing technologies and metagenomic approaches. In this paper, we describe a comparative computational analysis of 183 human gut microbiome sequence datasets, drawn from healthy individuals as well as those with autoimmune diseases. About 2.4 TB of Illumina deep sequencing metagenomic data were analyzed using computational workflows we developed, which run multiple steps of data- and computing-intensive analyses such as mapping, sequence assembly, gene identification, clustering and functional annotations. The analyses were carried out on the Gordon supercomputer at the San Diego Supercomputer Center (SDSC), using ~180,000 core hours and tens of TB storage space. Our analysis reveals the detailed microbial composition, dynamics, and functional profiles of the samples and provides new insight into how to correlate microbial profiles with human health and disease states.

[1]  L. Smarr Quantifying your body: a how-to guide from a systems biology perspective. , 2012, Biotechnology journal.

[2]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[3]  Zhengwei Zhu,et al.  FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes , 2011, Bioinform..

[4]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[5]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[6]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[7]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[8]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[9]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[14]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[15]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[16]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[17]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[18]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[19]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[20]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[21]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[22]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[23]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[24]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[25]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[26]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.