Similarly Strong Purifying Selection Acts on Human Disease Genes of All Evolutionary Ages

A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein–coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.

[1]  Michael Dean,et al.  Approaches to identify genes for complex human diseases: Lessons from Mendelian disorders , 2003, Human mutation.

[2]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[3]  L. Armengol,et al.  Origin of primate orphan genes: a comparative genomics approach. , 2008, Molecular biology and evolution.

[4]  J. Moult,et al.  Identification and analysis of deleterious human SNPs. , 2006, Journal of molecular biology.

[5]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[6]  Richard O Hynes,et al.  Susceptibility to Infection and Altered Hematopoiesis in Mice Deficient in Both P- and E-Selectins , 1996, Cell.

[7]  Zemin Zhang,et al.  Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective , 2008, Genome Biology.

[8]  Jacob de Vlieg,et al.  PhyloPat: an updated version of the phylogenetic pattern database contains gene neighborhood , 2008, Nucleic Acids Res..

[9]  A. Wagner Robustness against mutations in genetic networks of yeast , 2000, Nature Genetics.

[10]  Eric S. Lander,et al.  Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Wagner Gene duplications, robustness and evolutionary innovations. , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[12]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[13]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[14]  D. Haber,et al.  Yeast, flies, worms, and fish in the study of human disease. , 2003, The New England journal of medicine.

[15]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[16]  H. Ochman,et al.  Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. , 2004, Genome research.

[17]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[18]  M. Boguski,et al.  A Survey of Human Disease Gene Counterparts in the Drosophila Genome , 2000, The Journal of cell biology.

[19]  R. Elston Approaches to Gene Mapping in Complex Human Diseases. Edited by Jonathan L. Haines and Margaret A. Pericak Vance. John Wiley and Sons, New York, 1998. p. xxii + 434 pp., $69.95. , 1999 .

[20]  X. Gu Evolution of duplicate genes versus genetic robustness against null mutations. , 2003, Trends in genetics : TIG.

[21]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[22]  Y. Dong,et al.  Systematic functional analysis of the Caenorhabditis elegans genome using RNAi , 2003, Nature.

[23]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[24]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[25]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[26]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[27]  A. E. Hirsh,et al.  Functional genomic analysis of the rates of protein evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Dmitri A. Petrov,et al.  Pervasive and Persistent Redundancy among Duplicated Genes in Yeast , 2008, PLoS genetics.

[29]  Liran Carmel,et al.  Unifying measures of gene function and evolution , 2006, Proceedings of the Royal Society B: Biological Sciences.

[30]  A. Eyre-Walker,et al.  Human disease genes: patterns and predictions. , 2003, Gene.

[31]  E. Ostertag,et al.  Current topics in genome evolution: Molecular mechanisms of new gene formation , 2007, Cellular and Molecular Life Sciences.

[32]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[33]  Roded Sharan,et al.  Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes , 2006, Nucleic acids research.

[34]  Joakim Nivre AN EFFICIENT ALGORITHM , 2003 .

[35]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..

[36]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[37]  M. Albà,et al.  On homology searches by protein Blast and the characterization of the age of genes , 2007, BMC Evolutionary Biology.

[38]  W. J. Quesne The Uniquely Evolved Character Concept and its Cladistic Application , 1974 .

[39]  M Gribskov,et al.  A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. , 2001, Genome research.

[40]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[41]  Alexander A. Morgan,et al.  FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease , 2008, Genome Biology.

[42]  Xiaohui S. Xie,et al.  Disease gene discovery through integrative genomics. , 2005, Annual review of genomics and human genetics.

[43]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[44]  Jacob de Vlieg,et al.  PhyloPat: phylogenetic pattern analysis of eukaryotic genes , 2006, BMC Bioinformatics.

[45]  David K. Smith,et al.  Accelerated Evolutionary Rate May Be Responsible for the Emergence of Lineage-Specific Genes in Ascomycota , 2006, Journal of Molecular Evolution.

[46]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Sudhir Kumar,et al.  Gene Expression Intensity Shapes Evolutionary Rates of the Proteins Encoded by the Vertebrate Genome , 2004, Genetics.

[48]  Benjamin Friedlander,et al.  An efficient algorithm , 1983 .

[49]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[50]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[51]  Eugene V Koonin,et al.  The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages , 2009, Proceedings of the National Academy of Sciences.

[52]  P. Thomas,et al.  Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[53]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[54]  D. Graur,et al.  The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence. , 2006, Molecular biology and evolution.

[55]  D. Vitkup,et al.  Role of Duplicate Genes in Robustness against Deleterious Human Mutations , 2008, PLoS genetics.

[56]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[57]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[58]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[59]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[60]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[61]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[62]  Kosuke M. Teshima,et al.  Natural Selection on Genes that Underlie Human Disease Susceptibility , 2008, Current Biology.

[63]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[64]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[65]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[66]  C. Pál,et al.  Highly expressed genes in yeast evolve slowly. , 2001, Genetics.

[67]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[68]  Huanming Yang,et al.  Origin and evolution of new exons in rodents. , 2005, Genome research.

[69]  Jessica C Kissinger,et al.  Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria , 2008, BMC Evolutionary Biology.

[70]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[71]  C. Ponting,et al.  Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. , 2003, Genome research.

[72]  Aleksey Y Ogurtsov,et al.  Bioinformatical assay of human gene morbidity. , 2004, Nucleic acids research.

[73]  Tomislav Domazet-Loso,et al.  A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. , 2007, Trends in genetics : TIG.

[74]  M. Albà,et al.  Inverse relationship between evolutionary rate and age of mammalian genes. , 2005, Molecular biology and evolution.

[75]  C. Pál,et al.  An integrated view of protein evolution , 2006, Nature Reviews Genetics.

[76]  Diethard Tautz,et al.  An Ancient Evolutionary Origin of Genes Associated with Human Genetic Diseases , 2008, Molecular biology and evolution.

[77]  D. Tautz,et al.  An evolutionary analysis of orphan genes in Drosophila. , 2003, Genome research.

[78]  Doron Lancet,et al.  Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification , 2005, Bioinform..