论文信息 - A Combined Functional Annotation Score for Non-Synonymous Variants - 字舞流文

A Combined Functional Annotation Score for Non-Synonymous Variants

Aims: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. Methods: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: ‘HGMD-PUBLIC’ and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. Results: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. Conclusion: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.

Fiona Cunningham | Eleftheria Zeggini | Chris Joyce | Margarida C Lopes | Graham R S Ritchie | Sally L John | Jennifer Asimit | E. Zeggini | G. Ritchie | J. Asimit | Chris Joyce | Margarida C. Lopes | S. John | Fiona Cunningham

[1] S. Henikoff,et al. Predicting deleterious amino acid substitutions. , 2001, Genome research.

[2] Richard J. B. Dobson,et al. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes , 2006, BMC Bioinformatics.

[3] Daniel Rios,et al. A database and API for variation, dense genotyping and resequencing data , 2010, BMC Bioinformatics.

[4] P. Bork,et al. Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[5] D. Chasman,et al. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[6] M. Orozco,et al. Use of bioinformatics tools for the annotation of disease‐associated mutations in animal models , 2005, Proteins.

[7] M. Campbell,et al. PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[8] M. Orozco,et al. Sequence‐based prediction of pathological mutations , 2004, Proteins.

[9] Mi Zhou,et al. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms , 2005, Nucleic Acids Res..

[10] B. Rost,et al. SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[11] S. Henikoff,et al. Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[12] P. Stenson,et al. Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[13] Thomas Lengauer,et al. ROCR: visualizing classifier performance in R , 2005, Bioinform..

[14] D. Altshuler,et al. A map of human genome variation from population-scale sequencing , 2010, Nature.

[15] J. Moult,et al. SNPs, protein structure, and disease , 2001, Human mutation.

[16] E. Capriotti,et al. Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[17] Steven Henikoff,et al. SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[18] Emidio Capriotti,et al. Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[19] M. Orozco,et al. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. , 2002, Journal of molecular biology.

[20] J. Moult,et al. Loss of protein structure stability as a major causative factor in monogenic disease. , 2005, Journal of molecular biology.

[21] Vinayak Kulkarni,et al. Exhaustive prediction of disease susceptibility to coding base changes in the human genome , 2008, BMC Bioinformatics.

[22] Jun Guo,et al. Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines , 2007, BMC Bioinformatics.

[23] S. Sunyaev,et al. PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. , 1999, Protein engineering.

[24] Warren C. Lathe,et al. Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[25] P. Bork,et al. A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[26] Pietro Liò,et al. Prediction by Graph Theoretic Measures of Structural Effects in Proteins Arising from Non-Synonymous Single Nucleotide Polymorphisms , 2008, PLoS Comput. Biol..

[27] David R. Westhead,et al. A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[28] Elizabeth M. Smigielski,et al. dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[29] Sungsam Gong,et al. A Structural Bioinformatics Approach to the Analysis of nonsynonymous Single nucleotide polymorphisms (nsSNPS) and their Relation to Disease , 2007, J. Bioinform. Comput. Biol..

[30] A. Gonzalez-Perez,et al. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[31] S. Batzoglou,et al. Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[32] J. Moult,et al. Identification and analysis of deleterious human SNPs. , 2006, Journal of molecular biology.