The structural effects of mutations can aid in differential phenotype prediction of beta-myosin heavy chain (Myosin-7) missense variants

MOTIVATION High-throughput sequencing platforms are increasingly used to screen patients with genetic disease for pathogenic mutations, but prediction of the effects of mutations remains challenging. Previously we developed SAAPdap (Single Amino Acid Polymorphism Data Analysis Pipeline) and SAAPpred (Single Amino Acid Polymorphism Predictor) that use a combination of rule-based structural measures to predict whether a missense genetic variant is pathogenic. Here we investigate whether the same methodology can be used to develop a differential phenotype predictor, which, once a mutation has been predicted as pathogenic, is able to distinguish between phenotypes-in this case the two major clinical phenotypes (hypertrophic cardiomyopathy, HCM and dilated cardiomyopathy, DCM) associated with mutations in the beta-myosin heavy chain (MYH7) gene product (Myosin-7). RESULTS A random forest predictor trained on rule-based structural analyses together with structural clustering data gave a Matthews' correlation coefficient (MCC) of 0.53 (accuracy, 75%). A post hoc removal of machine learning models that performed particularly badly, increased the performance (MCC = 0.61, Acc = 79%). This proof of concept suggests that methods used for pathogenicity prediction can be extended for use in differential phenotype prediction. AVAILABILITY AND IMPLEMENTATION Analyses were implemented in Perl and C and used the Java-based Weka machine learning environment. Please contact the authors for availability. CONTACTS andrew@bioinf.org.uk or andrew.martin@ucl.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Burkhard Rost,et al.  SNAP predicts effect of mutations on protein function , 2008, Bioinform..

[2]  Valentin A. Ilyin,et al.  Structure SNP (StSNP): a web server for mapping and modeling nsSNPs on protein structures with linkage to metabolic pathways , 2007, Nucleic Acids Res..

[3]  Andrew C. R. Martin,et al.  The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations , 2013, BMC Genomics.

[4]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[5]  M. Komajda,et al.  Organization and sequence of human cardiac myosin binding protein C gene (MYBPC3) and identification of mutations predicted to produce truncated proteins in familial hypertrophic cardiomyopathy. , 1997, Circulation research.

[6]  Rituraj Purohit,et al.  Roadmap to determine the point mutations involved in cardiomyopathy disorder: a Bayesian approach. , 2013, Gene.

[7]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[8]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[9]  Andrew C R Martin,et al.  G6PDdb, an integrated database of glucose‐6‐phosphate dehydrogenase (G6PD) mutations , 2002, Human mutation.

[10]  Simon Kasif,et al.  topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association , 2004, Nucleic Acids Res..

[11]  Sean D. Mooney,et al.  MutDB services: interactive structural analysis of mutation data , 2005, Nucleic Acids Res..

[12]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[13]  Alexander V. Diemand,et al.  The Swiss‐Prot variant page and the ModSNP database: A resource for sequence and structure information on human protein variants , 2004, Human mutation.

[14]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[15]  S. Loughna,et al.  Cardiomyopathy: A Systematic Review of Disease-Causing Mutations in Myosin Heavy Chain 7 and Their Phenotypic Manifestations , 2009, Cardiology.

[16]  Euan A Ashley,et al.  Cardiac Structural and Sarcomere Genes Associated With Cardiomyopathy Exhibit Marked Intolerance of Genetic Variation , 2012, Circulation. Cardiovascular genetics.

[17]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[18]  Claudia Giambartolomei,et al.  Genetic complexity in hypertrophic cardiomyopathy revealed by high-throughput sequencing , 2013, Journal of Medical Genetics.

[19]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[20]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[21]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[22]  Christine E Seidman,et al.  Phenotypic diversity in hypertrophic cardiomyopathy. , 2002, Human molecular genetics.

[23]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[24]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[25]  J. Svendsen,et al.  New population-based exome data are questioning the pathogenicity of previously cardiomyopathy-associated genetic variants , 2013, European Journal of Human Genetics.

[26]  François Stricher,et al.  SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs , 2004, Nucleic Acids Res..

[27]  Karen S. Frese,et al.  Atlas of the clinical genetics of human dilated cardiomyopathy. , 2014, European heart journal.

[28]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[29]  David Haussler,et al.  LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources , 2005, Bioinform..

[30]  I. Chin-Sang,et al.  Characterization of loss-of-function and gain-of-function Eph receptor tyrosine kinase signaling in C. elegans axon targeting and cell migration. , 2006, Developmental biology.

[31]  J. Spudich,et al.  Hypertrophic and Dilated Cardiomyopathy: Four Decades of Basic Research on Muscle Lead to Potential Therapeutic Approaches to These Devastating Genetic Diseases , 2014, Biophysical journal.

[32]  Andrew C. R. Martin,et al.  Human Mutation , 2020 .

[33]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[34]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[35]  W. McKenna,et al.  New insights into the pathology of inherited cardiomyopathy , 2005, Heart.

[36]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[37]  I. Rigoutsos,et al.  The complex transcriptional landscape of the anucleate human platelet , 2013, BMC Genomics.

[38]  Alison L. Cuff,et al.  Integrating mutation data and structural analysis of the TP53 tumor‐suppressor protein , 2002, Human mutation.

[39]  David R. Westhead,et al.  KvSNP: accurately predicting the effect of genetic variants in voltage-gated potassium channels , 2011, Bioinform..

[40]  M. Sternberg,et al.  SuSPect: Enhanced Prediction of Single Amino Acid Variant (SAV) Phenotype Using Network Features , 2014, Journal of molecular biology.

[41]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[42]  Mi Zhou,et al.  nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms , 2005, Nucleic Acids Res..

[43]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[44]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[45]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[46]  M. Komajda,et al.  Hypertrophic Cardiomyopathy: Distribution of Disease Genes, Spectrum of Mutations, and Implications for a Molecular Diagnosis Strategy , 2003, Circulation.

[47]  E D Wigle,et al.  Mutations of the β myosin heavy chain gene in hypertrophic cardiomyopathy: critical functional sites determine prognosis , 2003, Heart.