Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease.

[1]  Hannah Carter,et al.  Predicting the functional consequences of somatic missense mutations found in tumors. , 2014, Methods in molecular biology.

[2]  Keizo Takao,et al.  Genomic responses in mouse models greatly mimic human inflammatory diseases , 2014, Proceedings of the National Academy of Sciences.

[3]  K. Boycott,et al.  Rare-disease genetics in the era of next-generation sequencing: discovery to translation , 2013, Nature Reviews Genetics.

[4]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[5]  Motonori Ota,et al.  The Protein Mutant Database , 1999, Nucleic Acids Res..

[6]  Christian Schaefer,et al.  SNPdbe: constructing an nsSNP functional impacts database , 2011, Bioinform..

[7]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[8]  Chris Sander,et al.  Jury returns on structure prediction , 1992, Nature.

[9]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[10]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[11]  J. Manola,et al.  Predicting survival in head and neck squamous cell carcinoma from TP53 mutation , 2014, Human Genetics.

[12]  M. Vihinen,et al.  Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods , 2009, Human mutation.

[13]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[14]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[15]  Gert Vriend,et al.  Status quo of annotation of human disease variants , 2013, BMC Bioinformatics.

[16]  Burkhard Rost,et al.  CHOP proteins into structural domain‐like fragments , 2004, Proteins.

[17]  Mauno Vihinen,et al.  PON‐P: Integrated predictor for pathogenicity of missense variants , 2012, Human mutation.

[18]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[19]  Alessandro Vullo,et al.  Ensembl 2015 , 2014, Nucleic Acids Res..

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[22]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[23]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[24]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease , 2014, Nucleic Acids Res..

[25]  Matthew D. Mailman,et al.  OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI , 2005, Nucleic Acids Res..

[26]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[27]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[28]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[29]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[30]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[31]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[32]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[33]  Stanley Fields,et al.  Measuring the activity of protein variants on a large scale using deep mutational scanning , 2014, Nature Protocols.

[34]  E. Boerwinkle,et al.  dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and Annotations , 2013, Human mutation.

[35]  Yana Bromberg,et al.  News from the protein mutability landscape. , 2013, Journal of molecular biology.

[36]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[37]  B. Rost,et al.  Neutral and weakly nonneutral sequence variants may define individuality , 2013, Proceedings of the National Academy of Sciences.

[38]  Carole A. Goble,et al.  Towards BioDBcore: a community-defined information specification for biological databases , 2011, Database : the journal of biological databases and curation.

[39]  Michael F. Ochs,et al.  Gene Function Analysis , 2007, Methods in Molecular Biology™.

[40]  Thomas A. Hopf Phenotype prediction from evolutionary sequence covariation , 2016 .

[41]  G. Schreiber,et al.  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. , 2009, Protein engineering, design & selection : PEDS.

[42]  Syed Haider,et al.  Ensembl BioMarts: a hub for data retrieval across taxonomic space , 2011, Database J. Biol. Databases Curation.

[43]  Christine A. Orengo,et al.  Identifying and characterising key alternative splicing events in Drosophila development , 2015, BMC Genomics.

[44]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[45]  S. Brunak,et al.  Neural network model of the genetic code is strongly correlated to the GES scale of amino acid transfer free energies. , 1994, Journal of molecular biology.

[46]  T. Casci,et al.  Evo–devo: Plastic flies , 2011, Nature Reviews Genetics.

[47]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[48]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[49]  R. Gamelli,et al.  Genomic responses in mouse models poorly mimic human inflammatory diseases , 2013, Proceedings of the National Academy of Sciences.

[50]  Yana Bromberg,et al.  Chapter 15: Disease Gene Prioritization , 2013, PLoS Comput. Biol..

[51]  T. Miyakawa,et al.  Genomic responses in mouse models poorly mimic human inflammatory diseases , 2013 .

[52]  Weisong Liu,et al.  The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease , 2014, Nucleic Acids Res..