In silico analysis of missense substitutions using sequence‐alignment based methods

Genetic testing for mutations in high‐risk cancer susceptibility genes often reveals missense substitutions that are not easily classified as pathogenic or neutral. Among the methods that can help in their classification are computational analyses. Predictions of pathogenic vs. neutral, or the probability that a variant is pathogenic, can be made based on: 1) inferences from evolutionary conservation using protein multiple sequence alignments (PMSAs) of the gene of interest for almost any missense sequence variant; and 2) for many variants, structural features of wild‐type and variant proteins. These in silico methods have improved considerably in recent years. In this work, we review and/or make suggestions with respect to: 1) the rationale for using in silico methods to help predict the consequences of missense variants; 2) important aspects of creating PMSAs that are informative for classification; 3) specific features of algorithms that have been used for classification of clinically‐observed variants; 4) validation studies demonstrating that computational analyses can have predictive values (PVs) of ∼75 to 95%; 5) current limitations of data sets and algorithms that need to be addressed to improve the computational classifiers; and 6) how in silico algorithms can be a part of the “integrated analysis” of multiple lines of evidence to help classify variants. We conclude that carefully validated computational algorithms, in the context of other evidence, can be an important tool for classification of missense variants. Hum Mutat 29(11), 1327–1336, 2008. © 2008 Wiley‐Liss, Inc.

[1]  Alun Thomas,et al.  Classification of rare missense substitutions, using risk surfaces, with genetic‐ and molecular‐epidemiology applications , 2008, Human mutation.

[2]  Paolo Boffetta,et al.  Assessing pathogenicity: overview of results from the IARC Unclassified Genetic Variants Working Group , 2008, Human mutation.

[3]  N. de Wind,et al.  Tumor characteristics as an analytic tool for classifying genetic variants of uncertain clinical significance , 2008, Human mutation.

[4]  Douglas F Easton,et al.  Genetic evidence and integration of various data sources for classifying uncertain variants into a single model , 2008, Human mutation.

[5]  A. Spurdle,et al.  Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results , 2008, Human mutation.

[6]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[7]  Stephen B Gruber,et al.  Accurate classification of MLH1/MSH2 missense variants with multivariate analysis of protein polymorphisms–mismatch repair (MAPP‐MMR) , 2008, Human mutation.

[8]  Sue Healey,et al.  Clinical classification of BRCA1 and BRCA2 DNA sequence variants: the value of cytokeratin profiles and evolutionary analysis--a report from the kConFab Investigators. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[9]  H. Campbell,et al.  Classification of ambiguous mutations in DNA mismatch repair genes identified in a population‐based study of colorectal cancer , 2008, Human mutation.

[10]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[11]  E. Capriotti,et al.  Use of estimated evolutionary strength at the codon level improves the prediction of disease‐related protein mutations in humans , 2008, Human mutation.

[12]  Fergus J Couch,et al.  A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. , 2007, American journal of human genetics.

[13]  L. Serrano,et al.  Predicted effects of missense mutations on native-state stability account for phenotypic outcome in phenylketonuria, a paradigm of misfolding diseases. , 2007, American journal of human genetics.

[14]  P. Lockhart,et al.  A reality check for alignments and trees. , 2007, Trends in genetics : TIG.

[15]  Andrew J. Grimm,et al.  Interpreting missense variants: comparing computational methods in human disease genes CDKN2A, MLH1, MSH2, MECP2, and tyrosinase (TYR) , 2007, Human mutation.

[16]  Joost Schymkowitz,et al.  The stability effects of protein mutations appear to be universally distributed. , 2007, Journal of molecular biology.

[17]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[18]  Andrew J. Bulpitt,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl649 Genome analysis Deleterious SNP prediction: be mindful of your training data! , 2022 .

[19]  Andrej Sali,et al.  Functional Impact of Missense Variants in BRCA1 Predicted by Supervised Learning , 2006, PLoS Comput. Biol..

[20]  J. Hopper,et al.  Genetic and histopathologic evaluation of BRCA1 and BRCA2 DNA sequence variants of unknown clinical significance. , 2006, Cancer research.

[21]  A. Zharkikh,et al.  Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral , 2005, Journal of Medical Genetics.

[22]  J. Hopper,et al.  Genetic, functional, and histopathological evaluation of two C-terminal BRCA1 missense variants , 2005, Journal of Medical Genetics.

[23]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[24]  B. Ward,et al.  Application of embryonic lethal or other obvious phenotypes to characterize the clinical significance of genetic variants found in trans with known deleterious mutations. , 2005, Cancer research.

[25]  J. Moult,et al.  Loss of protein structure stability as a major causative factor in monogenic disease. , 2005, Journal of molecular biology.

[26]  A. de la Chapelle,et al.  Functional significance and clinical phenotype of nontruncating mismatch repair variants of MLH1. , 2005, Gastroenterology.

[27]  Modesto Orozco,et al.  PMUT: a web-based tool for the annotation of pathological mutations on proteins , 2005, Bioinform..

[28]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[29]  M. Gerstein,et al.  Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms , 2005, Nucleic acids research.

[30]  M. Orozco,et al.  Sequence‐based prediction of pathological mutations , 2004, Proteins.

[31]  P. Thomas,et al.  Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[32]  F. Couch,et al.  Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. , 2004, American journal of human genetics.

[33]  A. Zharkikh,et al.  Analysis of missense variation in human BRCA1 in the context of interspecific sequence variation , 2004, Journal of Medical Genetics.

[34]  Nebojsa Mirkovic,et al.  Structure-Based Assessment of Missense Mutations in Human BRCA1 , 2004, Cancer Research.

[35]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[36]  C. Sander,et al.  The amino-acid mutational spectrum of human genetic disease , 2003, Genome Biology.

[37]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[38]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[39]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[40]  J. Bond,et al.  Detailed computational study of p53 and p16: using evolutionary sequence analysis and disease-associated mutations to predict the functional consequences of allelic variants , 2003, Oncogene.

[41]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[42]  M. Orozco,et al.  Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. , 2002, Journal of molecular biology.

[43]  L. Cannon-Albright,et al.  Characterization of common BRCA1 and BRCA2 variants. , 2002, Genetic testing.

[44]  M. Miller,et al.  Understanding human disease mutations through the use of interspecific genetic variation. , 2001, Human molecular genetics.

[45]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[46]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[47]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[48]  Wojciech Makalowski,et al.  Evolutionary conservation and somatic mutation hotspot maps of p53: correlation with p53 protein structural and functional features , 1999, Oncogene.

[49]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[50]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[51]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[52]  J. L. King,et al.  Deleterious Mutations and Neutral Substitutions , 1971, Nature.

[53]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.