A Combined Functional Annotation Score for Non-Synonymous Variants

Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: ‘HGMD-PUBLIC’ and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.

[1]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[2]  Richard J. B. Dobson,et al.  Predicting deleterious nsSNPs: an analysis of sequence and structural attributes , 2006, BMC Bioinformatics.

[3]  Daniel Rios,et al.  A database and API for variation, dense genotyping and resequencing data , 2010, BMC Bioinformatics.

[4]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[5]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[6]  M. Orozco,et al.  Use of bioinformatics tools for the annotation of disease‐associated mutations in animal models , 2005, Proteins.

[7]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[8]  M. Orozco,et al.  Sequence‐based prediction of pathological mutations , 2004, Proteins.

[9]  Mi Zhou,et al.  nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms , 2005, Nucleic Acids Res..

[10]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[11]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[12]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[13]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[14]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[15]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[16]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[17]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[18]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[19]  M. Orozco,et al.  Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. , 2002, Journal of molecular biology.

[20]  J. Moult,et al.  Loss of protein structure stability as a major causative factor in monogenic disease. , 2005, Journal of molecular biology.

[21]  Vinayak Kulkarni,et al.  Exhaustive prediction of disease susceptibility to coding base changes in the human genome , 2008, BMC Bioinformatics.

[22]  Jun Guo,et al.  Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines , 2007, BMC Bioinformatics.

[23]  S. Sunyaev,et al.  PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. , 1999, Protein engineering.

[24]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[25]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[26]  Pietro Liò,et al.  Prediction by Graph Theoretic Measures of Structural Effects in Proteins Arising from Non-Synonymous Single Nucleotide Polymorphisms , 2008, PLoS Comput. Biol..

[27]  David R. Westhead,et al.  A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[28]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[29]  Sungsam Gong,et al.  A Structural Bioinformatics Approach to the Analysis of nonsynonymous Single nucleotide polymorphisms (nsSNPS) and their Relation to Disease , 2007, J. Bioinform. Comput. Biol..

[30]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[31]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[32]  J. Moult,et al.  Identification and analysis of deleterious human SNPs. , 2006, Journal of molecular biology.