Improving the in silico assessment of pathogenicity for compensated variants

Understanding the functional sequelae of amino-acid replacements is of fundamental importance in medical genetics. Perhaps, the most intuitive way to assess the potential pathogenicity of a given human missense variant is by measuring the degree of evolutionary conservation of the substituted amino-acid residue, a feature that generally serves as a good proxy metric for the functional/structural importance of that residue. However, the presence of putatively compensated variants as the wild-type alleles in orthologous proteins of other mammalian species not only challenges this classical view of amino-acid essentiality but also precludes the accurate evaluation of the functional impact of this type of missense variant using currently available bioinformatic prediction tools. Compensated variants constitute at least 4% of all known missense variants causing human-inherited disease and hence represent an important potential source of error in that they are likely to be disproportionately misclassified as benign variants. The consequent under-reporting of compensated variants is exacerbated in the context of next-generation sequencing where their inappropriate exclusion constitutes an unfortunate natural consequence of the filtering and prioritization of the very large number of variants generated. Here we demonstrate the reduced performance of currently available pathogenicity prediction tools when applied to compensated variants and propose an alternative machine-learning approach to assess likely pathogenicity for this particular type of variant.

[1]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[2]  Peter H. Sudmant,et al.  Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding , 2015, Science.

[3]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[4]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[5]  M. Elleder,et al.  Subclinical course of adult visceral Niemann–Pick type C1 disease. A rare or underdiagnosed disorder? , 2006, Journal of Inherited Metabolic Disease.

[6]  S E Ide,et al.  Mutation in the alpha-synuclein gene identified in families with Parkinson's disease. , 1997, Science.

[7]  P. Bénit,et al.  Five novel missense mutations of the phenylalanine hydroxylase gene in phenylketonuria , 1994, Human mutation.

[8]  S. Sunyaev,et al.  Identification of cis-suppression of human disease mutations by comparative genomics , 2015, Nature.

[9]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[10]  K. Weinberg,et al.  Three new adenosine deaminase mutations that define a splicing enhancer and cause severe and partial phenotypes: implications for evolution of a CpG hotspot and expression of a transduced ADA cDNA. , 1995, Human molecular genetics.

[11]  A. Munnich,et al.  Partial duplication [dup. TCAC (178)] and novel point mutations (T125M, G188R, A209V, and H302L) of the ornithine transcarbamylase gene in congenital hyperammonemia , 1996, Human mutation.

[12]  Luísa Azevedo,et al.  Epistatic interactions: how strong in disease and evolution? , 2006, Trends in genetics : TIG.

[13]  A. Amorim,et al.  Epistatic interactions modulate the evolution of mammalian mitochondrial respiratory complex components , 2009, BMC Genomics.

[14]  S. Sunyaev,et al.  Dobzhansky–Muller incompatibilities in protein evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[16]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[17]  Jianzhi Zhang,et al.  Why are some human disease-associated mutations fixed in mice? , 2003, Trends in genetics : TIG.

[18]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[19]  J. Kaplan,et al.  A missense mutation (F87L) in exon 3 of the cystic fibrosis transmembrane conductance regulator gene , 1994, Human mutation.

[20]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[21]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.

[22]  Hannah Carter,et al.  CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer , 2011, Bioinform..

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Robert L. Nussbaum,et al.  Mutation in the α-Synuclein Gene Identified in Families with Parkinson's Disease , 1997 .

[25]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[28]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[29]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[30]  G. Scheper,et al.  Leukoencephalopathy with vanishing white matter presenting with presenile dementia , 2009, Journal of Neurology, Neurosurgery, and Psychiatry.