The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome

BackgroundHuman disease genes can be distinguished from essential (embryonically lethal) and non-disease genes using gene attributes. Such attributes include gene age, tissue specificity of expression, regulatory capacity, sequence length, rate of sequence variation and capacity for interaction. The resulting information has been used to inform data mining approaches seeking to identify novel disease genes. Given the dynamic nature of this field and the rapid rise in relevant information, we have chosen to perform a single integrated mining approach to explore relationships among gene attributes and thereby characterise evolutionary trends associated with disease genes.ResultsAll against all cross comparison of 2,522 disease gene attributes revealed significant relationships existed between the age, disease-association and expression pattern of genes and the tissues within which they are expressed. We found that the over-representation of disease genes among old genes holds for tissue-specific genes, but the correlation between age and disease association vanished when conditioning on tissue-specificity. Of the 32 tissues studied, the genes expressed in pancreas are on average older than the genes expressed in any other tissue, while the testis expressed the lowest proportion of old genes. Following a focussed analysis on the impact of regulatory apparatus on evolution of disease genes, we show that regulators, comprising transcription factors and post-translation modified proteins, are over-represented among ancient disease genes. In addition, we show that the proportion of regulator genes is affected by gene age among disease genes and by tissue-specificity among non-disease genes. Finally, using 55,606 true positive gene interaction data, we find that old disease genes interacts with other old disease genes and interacting new genes interacts with genes originating from higher phylostrata.ConclusionThis study supports the non-random nature of the human diseasome. We have identified a variety of distinct features and correlations to other molecular attributes that can be used to distinguish the set of disease causing genes. This was achieved by harnessing the power of mining large scale datasets from OMIM and other databases. Ultimately such knowledge may contribute to the identification of novel human disease genes and an enhanced understanding of human biology.

[1]  O. Madsen Pancreas phylogeny and ontogeny in relation to a 'pancreatic stem cell'. , 2007, Comptes rendus biologies.

[2]  B. Benayoun,et al.  A post-translational modification code for transcription factors: sorting through a sea of signals. , 2009, Trends in cell biology.

[3]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[4]  A. Eyre-Walker,et al.  Human disease genes: patterns and predictions. , 2003, Gene.

[5]  C. V. Jongeneel,et al.  An atlas of human gene expression from massively parallel signature sequencing (MPSS). , 2005, Genome research.

[6]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[7]  Antonio Reverter,et al.  Mining tissue specificity, gene connectivity and disease association to reveal a set of genes that modify the action of disease causing genes , 2008, BioData Mining.

[8]  L. Duret,et al.  Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. , 2000, Molecular biology and evolution.

[9]  T. Werner,et al.  Regulatory context is a crucial part of gene function. , 2002, Trends in genetics : TIG.

[10]  Christos A. Ouzounis,et al.  Highly consistent patterns for inherited human diseases at the molecular level , 2006, Bioinform..

[11]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[12]  C. Ponting,et al.  Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. , 2003, Genome research.

[13]  Paulo P. Amaral,et al.  The Eukaryotic Genome as an RNA Machine , 2008, Science.

[14]  Aleksey Y Ogurtsov,et al.  Bioinformatical assay of human gene morbidity. , 2004, Nucleic acids research.

[15]  Tomislav Domazet-Loso,et al.  A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. , 2007, Trends in genetics : TIG.

[16]  A. Reymond,et al.  Emergence of Young Human Genes after a Burst of Retroposition in Primates , 2005, PLoS biology.

[17]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[18]  Diethard Tautz,et al.  An Ancient Evolutionary Origin of Genes Associated with Human Genetic Diseases , 2008, Molecular biology and evolution.

[19]  Laurence D. Hurst,et al.  Do essential genes evolve slowly? , 1999, Current Biology.

[20]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[21]  C. Kashork,et al.  Partial deletions of the long arm of chromosome 13 associated with holoprosencephaly and the Dandy-Walker malformation. , 2002, American journal of medical genetics.

[22]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[23]  Francisco S. Roque,et al.  A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes , 2008, Proceedings of the National Academy of Sciences.

[24]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[25]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[26]  David L. Steffen,et al.  The DNA sequence of the human X chromosome , 2005, Nature.

[27]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.