More than 9,000,000 Unique Genes in Human Gut Bacterial Community: Estimating Gene Numbers Inside a Human Body

Background Estimating the number of genes in human genome has been long an important problem in computational biology. With the new conception of considering human as a super-organism, it is also interesting to estimate the number of genes in this human super-organism. Principal Findings We presented our estimation of gene numbers in the human gut bacterial community, the largest microbial community inside the human super-organism. We got 552,700 unique genes from 202 complete human gut bacteria genomes. Then, a novel gene counting model was built to check the total number of genes by combining culture-independent sequence data and those complete genomes. 16S rRNAs were used to construct a three-level tree and different counting methods were introduced for the three levels: strain-to-species, species-to-genus, and genus-and-up. The model estimates that the total number of genes is about 9,000,000 after those with identity percentage of 97% or up were merged. Conclusion By combining completed genomes currently available and culture-independent sequencing data, we built a model to estimate the number of genes in human gut bacterial community. The total number of genes is estimated to be about 9 million. Although this number is huge, we believe it is underestimated. This is an initial step to tackle this gene counting problem for the human super-organism. It will still be an open problem in the near future. The list of genomes used in this paper can be found in the supplementary table.

[1]  Ian J. Brown,et al.  Human metabolic phenotype diversity and its association with diet and blood pressure , 2008, Nature.

[2]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[3]  F. Bäckhed,et al.  Host-Bacterial Mutualism in the Human Intestine , 2005, Science.

[4]  Jeffrey I. Gordon,et al.  Mechanisms underlying the resistance to diet-induced obesity in germ-free mice , 2007, Proceedings of the National Academy of Sciences.

[5]  Chaochun Wei,et al.  Using ESTs to improve the accuracy of de novo gene prediction , 2006, BMC Bioinformatics.

[6]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[7]  Bin Zhang,et al.  A systems biology approach to drug discovery. , 2008, Advances in genetics.

[8]  C. Burge,et al.  Assessment of the total number of human transcription units. , 2001, Genomics.

[9]  J Lederberg,et al.  Infectious History , 2000, Science.

[10]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[11]  M. Adams,et al.  How many genes in the human genome? , 1994, Nature Genetics.

[12]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[13]  A. J. Jones,et al.  At Least 1 in 20 16S rRNA Sequence Records Currently Held in Public Repositories Is Estimated To Contain Substantial Anomalies , 2005, Applied and Environmental Microbiology.

[14]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.

[15]  Chaochun Wei,et al.  Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions. , 2005, Genome research.

[16]  J. Ferrières,et al.  Metabolic Endotoxemia Initiates Obesity and Insulin Resistance , 2007, Diabetes.

[17]  Ting Wang,et al.  The gut microbiota as an environmental factor that regulates fat storage. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[19]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[20]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[21]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[22]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[23]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[24]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.