Multiple Property Tolerance Analysis for the Evaluation of Missense Mutations

Computational prediction of the impact of a mutation on protein function is still not accurate enough for clinical diagnostics without additional human expert analysis. Sequence alignment-based methods have been extensively used but their results highly depend on the quality of the input alignments and the choice of sequences. Incorporating the structural information with alignments improves prediction accuracy. Here, we present a conservation of amino acid properties method for mutation prediction, Multiple Properties Tolerance Analysis (MuTA), and a new strategy, MuTA/S, to incorporate the solvent accessible surface (SAS) property into MuTA. Instead of combining multiple features by machine learning or mathematical methods, an intuitive strategy is used to divide the residues of a protein into different groups, and in each group the properties used is adjusted. The results for LacI, lysozyme, and HIV protease show that MuTA performs as well as the widely used SIFT algorithm while MuTA/S outperforms SIFT and MuTA by 2%–25% in terms of prediction accuracy. By incorporating the SAS term alone, the alignment dependency of overall prediction accuracy is significantly reduced. MuTA/S also defines a new way to incorporate any structural features and knowledge and may lead to more accurate predictions.

[1]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[2]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[3]  Teri E. Klein,et al.  The functional importance of disease-associated mutation , 2002, BMC Bioinformatics.

[4]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[5]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[6]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[7]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[8]  Alberto Riva,et al.  Bayesian approach to discovering pathogenic SNPs in conserved protein domains , 2004, Human mutation.

[9]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[10]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[11]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[12]  A. Zharkikh,et al.  Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral , 2005, Journal of Medical Genetics.

[13]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[14]  David R. Westhead,et al.  A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[15]  Peer Bork,et al.  Impact of selection, mutation rate and genetic drift on human genetic variation. , 2003, Human molecular genetics.

[16]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[18]  L. Tsui,et al.  Erratum: Identification of the Cystic Fibrosis Gene: Cloning and Characterization of Complementary DNA , 1989, Science.

[19]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[20]  Yan Cui,et al.  Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information , 2005, Bioinform..

[21]  M. Orozco,et al.  Sequence‐based prediction of pathological mutations , 2004, Proteins.

[22]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[23]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[24]  Albert Y Lau,et al.  Functional classification of proteins and protein variants. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[26]  H. Wajcman,et al.  In silico prediction of the deleterious effect of a mutation: proceed with caution in clinical genetics. , 2004, Clinical chemistry.

[27]  N D Clarke,et al.  Covariation of residues in the homeodomain sequence family , 1995, Protein science : a publication of the Protein Society.

[28]  John Moult,et al.  Three‐dimensional structural location and molecular functional effects of missense SNPs in the T cell receptor Vβ domain , 2003, Proteins.

[29]  Werner Braun,et al.  Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules , 1998 .

[30]  L. Tsui,et al.  Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. , 1989, Science.

[31]  M. Orozco,et al.  Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. , 2002, Journal of molecular biology.

[32]  Andrew C R Martin,et al.  G6PDdb, an integrated database of glucose‐6‐phosphate dehydrogenase (G6PD) mutations , 2002, Human mutation.

[33]  J H Miller,et al.  Genetic studies of the lac repressor. I. Correlation of mutational sites with specific amino acid residues: construction of a colinear gene-protein map. , 1977, Journal of molecular biology.

[34]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[35]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[36]  Stacy T. Knutson,et al.  Prediction of deleterious functional effects of amino acid mutations using a library of structure‐based function descriptors , 2003, Proteins.

[37]  Werner Braun,et al.  Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules , 1998, J. Comput. Chem..

[38]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.