Predicting deleterious amino acid substitutions.

Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT, which sorts intolerant from tolerant substitutions, classifies substitutions as tolerated or deleterious. A higher proportion of substitutions predicted to be deleterious by SIFT gives an affected phenotype than substitutions predicted to be deleterious by substitution scoring matrices in three test cases. Using SIFT before mutagenesis studies could reduce the number of functional assays required and yield a higher proportion of affected phenotypes. may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  G. Chang,et al.  Crystal Structure of the Lactose Operon Repressor and Its Complexes with DNA and Inducer , 1996, Science.

[3]  T. Steitz,et al.  Crystal structure of lac repressor core tetramer and its implications for DNA looping. , 1995, Science.

[4]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[5]  E. Lander,et al.  Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999, Nature Genetics.

[6]  M. Minnich,et al.  Purification and biochemical characterization of recombinant simian immunodeficiency virus protease and comparison to human immunodeficiency virus type 1 protease. , 1991, Biochemistry.

[7]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[8]  J H Miller,et al.  Genetic studies of the lac repressor. I. Correlation of mutational sites with specific amino acid residues: construction of a colinear gene-protein map. , 1977, Journal of molecular biology.

[9]  J H Miller,et al.  Lac repressor genetic map in real space. , 1997, Trends in biochemical sciences.

[10]  J. Franklin,et al.  Functional relationships and structural determinants of two bacteriophage T4 lysozymes: a soluble (gene e) and a baseplate-associated (gene 5) protein. , 1989, The New biologist.

[11]  J. I. Lee,et al.  Amino acid substitution in the lactose carrier protein with the use of amber suppressors , 1992, Journal of bacteriology.

[12]  J U Bowie,et al.  Identifying determinants of folding and activity for a protein of unknown structure. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[13]  W. McClain,et al.  Baseplate protein of bacteriophage T4 with both structural and lytic functions , 1980, Journal of virology.

[14]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[15]  M. Lewis,et al.  A closer view of the conformation of the Lac repressor bound to operator , 2000, Nature Structural Biology.

[16]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[18]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[19]  E. Lander The New Genomics: Global Views of Biology , 1996, Science.

[20]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[21]  V. P. Chuprina,et al.  Structure of the complex of lac repressor headpiece and an 11 base-pair half-operator determined by nuclear magnetic resonance spectroscopy and restrained molecular dynamics. , 1994, Journal of Molecular Biology.

[22]  Douglas L. Brutlag,et al.  Enumerating and Ranking Discrete Motifs , 1997, ISMB.

[23]  I. Pastan,et al.  Proteases from human immunodeficiency virus and avian myeloblastosis virus show distinct specificities in hydrolysis of multidomain protein substrates , 1990, Journal of virology.

[24]  T. Magnuson,et al.  Genotype-based screen for ENU-induced mutations in mouse embryonic stem cells , 2000, Nature Genetics.

[25]  F. Arisaka,et al.  Isolation and characterization of the bacteriophage T4 tail-associated lysozyme , 1985, Journal of virology.

[26]  Jorja G. Henikoff,et al.  Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[27]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[28]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[29]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[30]  C. Dearolf,et al.  Targeted recovery of mutations in Drosophila. , 2000, Genetics.

[31]  R. Stroud,et al.  Saturation site-directed mutagenesis of thymidylate synthase. , 1990, The Journal of biological chemistry.

[32]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[33]  Edward M. Marcotte,et al.  Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core , 1996, Nature Structural Biology.

[34]  Yan P. Yuan,et al.  HGBASE: a database of SNPs and other variations in and around human genes , 2000, Nucleic Acids Res..

[35]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[36]  M. Jaskólski,et al.  Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease. , 1989, Science.

[37]  Christopher J. Lee,et al.  Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences , 2000, Nature Genetics.

[38]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Roland L. Dunbrack,et al.  Genomic Fold Assignment and Rational Modeling of Proteins of Biological Interest , 2000, ISMB.

[40]  D. Cooper,et al.  Human Gene Mutation Database , 1996, Human Genetics.

[41]  S. Henikoff,et al.  Targeting induced local lesions IN genomes (TILLING) for plant functional genomics. , 2000, Plant physiology.

[42]  C. Schiffer,et al.  How does a symmetric dimer recognize an asymmetric substrate? A substrate complex of HIV-1 protease. , 2000, Journal of molecular biology.

[43]  P. Bork,et al.  Towards a structural basis of human non-synonymous single nucleotide polymorphisms. , 2000, Trends in genetics : TIG.

[44]  Jeffrey Miller,et al.  Genetic Studies of Lac Repressor: 4000 Single Amino Acid Substitutions and Analysis of the Resulting Phenotypes on the Basis of the Protein Structure , 1996, German Conference on Bioinformatics.