Identification of deleterious mutations within three human genomes.

Each human carries a large number of deleterious mutations. Together, these mutations make a significant contribution to human disease. Identification of deleterious mutations within individual genome sequences could substantially impact an individual's health through personalized prevention and treatment of disease. Yet, distinguishing deleterious mutations from the massive number of nonfunctional variants that occur within a single genome is a considerable challenge. Using a comparative genomics data set of 32 vertebrate species we show that a likelihood ratio test (LRT) can accurately identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, which are likely to be unconditionally deleterious. The LRT is also able to identify known human disease alleles and performs as well as two commonly used heuristic methods, SIFT and PolyPhen. Application of the LRT to three human genomes reveals 796-837 deleterious mutations per individual, approximately 40% of which are estimated to be at <5% allele frequency. However, the overlap between predictions made by the LRT, SIFT, and PolyPhen, is low; 76% of predictions are unique to one of the three methods, and only 5% of predictions are shared across all three methods. Our results indicate that only a small subset of deleterious mutations can be reliably identified, but that this subset provides the raw material for personalized medicine.

[1]  Alan Hodgkinson,et al.  Cryptic Variation in the Human Mutation Rate , 2009, PLoS biology.

[2]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[3]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[4]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[5]  Justin C. Fay,et al.  A Catalog of Neutral and Deleterious Polymorphism in Yeast , 2008, PLoS genetics.

[6]  C. Nusbaum,et al.  Quality scores and SNP detection in sequencing-by-synthesis systems. , 2008, Genome research.

[7]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[8]  Robert P. St.Onge,et al.  The Chemical Genomic Portrait of Yeast: Uncovering a Phenotype for All Genes , 2008, Science.

[9]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[10]  P. Keightley,et al.  Joint Inference of the Distribution of Fitness Effects of Deleterious Mutations and Population Demography Based on Nucleotide Polymorphism Frequencies , 2007, Genetics.

[11]  Mikhail A. Roytberg,et al.  Analysis of Sequence Conservation at Nucleotide Resolution , 2007, PLoS Comput. Biol..

[12]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[13]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[14]  William Stafford Noble,et al.  Widely distributed noncoding purifying selection in the human genome , 2007, Proceedings of the National Academy of Sciences.

[15]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[16]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[17]  Andrew J. Bulpitt,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl649 Genome analysis Deleterious SNP prediction: be mindful of your training data! , 2022 .

[18]  Roded Sharan,et al.  Medical sequencing at the extremes of human body mass. , 2006, American journal of human genetics.

[19]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[20]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[21]  Arend Sidow,et al.  Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. , 2005, Annual review of genomics and human genetics.

[22]  Daniel J. Gaffney,et al.  The scale of mutational variation in the murid genome. , 2005, Genome research.

[23]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[24]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[25]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[26]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[27]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[28]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[29]  R. Myers,et al.  Quality assessment of the human genome sequence , 2004, Nature.

[30]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[31]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[32]  S. Sunyaev,et al.  Dobzhansky–Muller incompatibilities in protein evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[34]  Dan Graur,et al.  Ratios of radical to conservative amino acid replacement are affected by mutational and compositional factors and may not be indicative of positive Darwinian selection. , 2002, Molecular biology and evolution.

[35]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[36]  Justin C. Fay,et al.  Testing the neutral theory of molecular evolution with genomic data from Drosophila , 2002, Nature.

[37]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[38]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[39]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[40]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[41]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[42]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[43]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.

[44]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[45]  M. Lynch,et al.  The mutational meltdown in asexual populations. , 1993, The Journal of heredity.

[46]  J. Crow,et al.  Efficiency of truncation selection. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[47]  J. Crow,et al.  Mutations affecting fitness in Drosophila populations. , 1977, Annual review of genetics.

[48]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[49]  T. Ohta Slightly Deleterious Mutant Substitutions in Evolution , 1973, Nature.

[50]  N. Morton,et al.  AN ESTIMATE OF THE MUTATIONAL DAMAGE IN MAN FROM DATA ON CONSANGUINEOUS MARRIAGES. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[51]  H. Muller,et al.  Our load of mutations. , 1950, American journal of human genetics.