Kinact: a computational approach for predicting activating missense mutations in protein kinases

Abstract Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.

[1]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[2]  Tom L. Blundell,et al.  DNA-PKcs structure suggests an allosteric mechanism modulating DNA double-strand break repair , 2017, Science.

[3]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[4]  Michael T. Zimmermann,et al.  Molecular modeling and molecular dynamic simulation of the effects of variants in the TGFBR2 kinase domain as a paradigm for interpretation of variants obtained by next generation sequencing , 2017, PloS one.

[5]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[6]  Sony Malhotra,et al.  Structural Implications of Mutations Conferring Rifampin Resistance in Mycobacterium leprae , 2018, Scientific Reports.

[7]  Andrew H. Beck,et al.  A diverse array of cancer-associated MTOR mutations are hyperactivating and can predict rapamycin sensitivity. , 2014, Cancer discovery.

[8]  Brunangelo Falini,et al.  Genomics of Hairy Cell Leukemia , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  David B. Ascher,et al.  mCSM–NA: predicting the effects of mutations on protein–nucleic acids interactions , 2017, Nucleic Acids Res..

[11]  Nicholas B Rego,et al.  3Dmol.js: molecular visualization with WebGL , 2014, Bioinform..

[12]  Andrea Bernini,et al.  A Computational Approach From Gene to Structure Analysis of the Human ABCA4 Transporter Involved in Genetic Retinal Diseases. , 2017, Investigative ophthalmology & visual science.

[13]  Malancha Karmakar,et al.  Analysis of a Novel pncA Mutation for Susceptibility to Pyrazinamide Therapy , 2018, American journal of respiratory and critical care medicine.

[14]  Douglas E. V. Pires,et al.  In silico functional dissection of saturation mutagenesis: Interpreting the relationship between phenotypes and changes in protein stability, interactions and activity , 2016, Scientific Reports.

[15]  N. Schork,et al.  Kinase mutations in human disease: interpreting genotype–phenotype relationships , 2010, Nature Reviews Genetics.

[16]  Tom L. Blundell,et al.  SDM: a server for predicting effects of mutations on protein stability , 2017, Nucleic Acids Res..

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[19]  Cristina Marino Buslje,et al.  Kin-Driver: a database of driver mutations in protein kinases , 2014, Database J. Biol. Databases Curation.

[20]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[21]  Kostas Karpouzis,et al.  Emerging Artificial Intelligence Applications in Computer Engineering - Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies , 2007, Emerging Artificial Intelligence Applications in Computer Engineering.

[22]  David B. Ascher,et al.  Evolution of carbapenem resistance in Acinetobacter baumannii during a prolonged infection , 2017, bioRxiv.

[23]  Douglas E. V. Pires,et al.  Analysis of HGD Gene Mutations in Patients with Alkaptonuria from the United Kingdom: Identification of Novel Mutations. , 2015, JIMD reports.

[24]  David B. Ascher,et al.  Relapsed acute lymphoblastic leukemia-specific mutations in NT5C2 cluster into hotspots driving intersubunit stimulation , 2018, Leukemia.

[25]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  Douglas E. V. Pires,et al.  mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance , 2016, Scientific Reports.

[28]  Douglas E. V. Pires,et al.  Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance , 2016, BMC Medicine.

[29]  Michael Inouye,et al.  Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for EsxW Beijing variant in Vietnam , 2018, Nature Genetics.

[30]  Douglas E. V. Pires,et al.  CSM-lig: a web server for assessing and comparing protein–small molecule affinities , 2016, Nucleic Acids Res..

[31]  P. Cohen,et al.  The origins of protein phosphorylation , 2002, Nature Cell Biology.

[32]  Douglas E. V. Pires,et al.  DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach , 2014, Nucleic Acids Res..

[33]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[34]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[35]  Alfonso Valencia,et al.  wKinMut‐2: Identification and Interpretation of Pathogenic Variants in Human Protein Kinases , 2016, Human mutation.

[36]  Douglas E. V. Pires,et al.  Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes , 2014, Nucleic Acids Res..

[37]  Douglas E. V. Pires,et al.  SDHA related tumorigenesis: a new case series and literature review for variant interpretation and pathogenicity , 2017, Molecular genetics & genomic medicine.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  Li Ding,et al.  Activating HER2 mutations in HER2 gene amplification negative breast cancer. , 2013, Cancer discovery.

[40]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[41]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[42]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[43]  Tom L. Blundell,et al.  The Inosine Monophosphate Dehydrogenase, GuaB2, Is a Vulnerable New Bactericidal Drug Target for Tuberculosis , 2016, ACS infectious diseases.

[44]  Dimitri Y Chirgadze,et al.  DNA-PKcs, Allostery, and DNA Double-Strand Break Repair: Defining the Structure and Setting the Stage. , 2017, Methods in enzymology.

[45]  Douglas E. V. Pires,et al.  DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability , 2018, Nucleic Acids Res..

[46]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[47]  Douglas E. V. Pires,et al.  mCSM: predicting the effects of mutations in proteins using graph-based signatures , 2013, Bioinform..

[48]  Douglas E. V. Pires,et al.  mCSM-AB: a web server for predicting antibody–antigen affinity changes upon mutation with graph-based signatures , 2016, Nucleic Acids Res..

[49]  Alicia P. Higueruelo,et al.  Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures , 2017, Journal of molecular biology.

[50]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[51]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[52]  Tom L. Blundell,et al.  Genomes, structural biology and drug discovery: combating the impacts of mutations in genetic disease and antibiotic resistance , 2017, Biochemical Society transactions.

[53]  Douglas E. V. Pires,et al.  Germline Mutations in the CDKN2B Tumor Suppressor Gene Predispose to Renal Cell Carcinoma. , 2015, Cancer discovery.

[54]  Hongtao Yu,et al.  Familial STAG2 germline mutation defines a new human cohesinopathy , 2017, npj Genomic Medicine.

[55]  Ludevit Kadasi,et al.  Twelve novel HGD gene variants identified in 99 alkaptonuria patients: focus on ‘black bone disease’ in Italy , 2015, European Journal of Human Genetics.

[56]  T. Höfer,et al.  Multisite protein phosphorylation – from molecular mechanisms to kinetic models , 2009, The FEBS journal.

[57]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[58]  Carlos H M Rodrigues,et al.  Combating mutations in genetic disease and drug resistance: understanding molecular mechanisms to guide drug design , 2017, Expert opinion on drug discovery.