Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions.

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.

[1]  S. Hu,et al.  The role and impact of SNPs in pharmacogenomics and personalized medicine. , 2011, Current drug metabolism.

[2]  Shuyuan Guo,et al.  Three-dimensional (3D) structure prediction and function analysis of the chitin-binding domain 3 protein HD73_3189 from Bacillus thuringiensis HD73. , 2015, Bio-medical materials and engineering.

[3]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[4]  Dan Li,et al.  Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. , 2016, Physical chemistry chemical physics : PCCP.

[5]  E. Oldfield,et al.  Bisphosphonates inhibit the growth of Trypanosoma brucei, Trypanosoma cruzi, Leishmania donovani, Toxoplasma gondii, and Plasmodium falciparum: a potential route to chemotherapy. , 2001, Journal of medicinal chemistry.

[6]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[7]  Elizabeth Brunk,et al.  Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework , 2017, Genome Medicine.

[8]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[9]  Marilee Benore Response to review of Fundamental Laboratory Approaches for Biochemistry and Biotechnology , 2010, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[10]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[11]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[12]  Markus Rapedius,et al.  KCNJ10 gene mutations causing EAST syndrome (epilepsy, ataxia, sensorineural deafness, and tubulopathy) disrupt channel function , 2010, Proceedings of the National Academy of Sciences.

[13]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[14]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[15]  Chengfei Yan,et al.  Iterative Knowledge-Based Scoring Functions Derived from Rigid and Flexible Decoy Structures: Evaluation with the 2013 and 2014 CSAR Benchmarks , 2016, J. Chem. Inf. Model..

[16]  Charlotta P. I. Schärfe,et al.  Genetic variation in human drug-related genes , 2017, Genome Medicine.

[17]  M. Karplus,et al.  Dynamics of folded proteins , 1977, Nature.

[18]  G. Wilkinson,et al.  Drug metabolism and variability among patients in drug response. , 2005, The New England journal of medicine.

[19]  M. Gerstein,et al.  Localized structural frustration for evaluating the impact of sequence variants , 2013, bioRxiv.

[20]  G. Ginsburg,et al.  Personalized medicine: revolutionizing drug discovery and patient care. , 2001, Trends in biotechnology.

[21]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[22]  Zachary A. Szpiech,et al.  Prominent features of the amino acid mutation landscape in cancer , 2017, bioRxiv.

[23]  Shannon K. Stefl,et al.  Molecular mechanisms of disease-causing missense mutations. , 2013, Journal of molecular biology.

[24]  M Heath-Chiozzi,et al.  Clinical application of pharmacogenetics. , 2001, Trends in molecular medicine.

[25]  T. Creighton,et al.  Dissecting the disulphide-coupled folding pathway of bovine pancreatic trypsin inhibitor. Forming the first disulphide bonds in analogues of the reduced protein. , 1993, Journal of molecular biology.

[26]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[27]  L. Devy,et al.  Novel Antiangiogenic Effects of the Bisphosphonate Compound Zoledronic Acid , 2002, Journal of Pharmacology and Experimental Therapeutics.

[28]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[29]  William Pao,et al.  Novel D761Y and Common Secondary T790M Mutations in Epidermal Growth Factor Receptor–Mutant Lung Adenocarcinomas with Acquired Resistance to Kinase Inhibitors , 2006, Clinical Cancer Research.

[30]  Michael Kerger,et al.  Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer , 2015, Nature Communications.

[31]  P. Croucher,et al.  The bisphosphonate incadronate (YM175) causes apoptosis of human myeloma cells in vitro by inhibiting the mevalonate pathway. , 1998, Cancer research.

[32]  P. Musiani,et al.  Zoledronic acid repolarizes tumour-associated macrophages and inhibits mammary carcinogenesis by targeting the mevalonate pathway , 2009, Journal of cellular and molecular medicine.

[33]  Emil Alexov,et al.  A mutation in a ganglioside biosynthetic enzyme, ST3GAL5, results in salt & pepper syndrome, a neurocutaneous disorder with altered glycolipid and glycoprotein glycosylation. , 2014, Human molecular genetics.

[34]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[35]  S. Scheres,et al.  How cryo-EM is revolutionizing structural biology. , 2015, Trends in biochemical sciences.

[36]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[37]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[38]  A. Castro-Alvarez,et al.  The Performance of Several Docking Programs at Reproducing Protein–Macrolide-Like Crystal Structures , 2017, Molecules.

[39]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[40]  Benjamin J. Raphael,et al.  Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine , 2014, Genome Medicine.

[41]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[42]  Matthew Meyerson,et al.  Structures of lung cancer-derived EGFR mutants and inhibitor complexes: mechanism of activation and insights into differential inhibitor sensitivity. , 2007, Cancer cell.

[43]  M. Tsuboi,et al.  Analysis of Epidermal Growth Factor Receptor Gene Mutation in Patients with Non–Small Cell Lung Cancer and Acquired Resistance to Gefitinib , 2006, Clinical Cancer Research.

[44]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[45]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[46]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[47]  M. Meyerson,et al.  EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. , 2005, The New England journal of medicine.

[48]  Douglas E. V. Pires,et al.  Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes , 2014, Nucleic Acids Res..

[49]  Zhe Zhang,et al.  A Y328C missense mutation in spermine synthase causes a mild form of Snyder-Robinson syndrome. , 2013, Human molecular genetics.

[50]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[51]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[52]  H. Varmus,et al.  Acquired Resistance of Lung Adenocarcinomas to Gefitinib or Erlotinib Is Associated with a Second Mutation in the EGFR Kinase Domain , 2005, PLoS medicine.

[53]  Lin Wang,et al.  Analyzing Effects of Naturally Occurring Missense Mutations , 2012, Comput. Math. Methods Medicine.

[54]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[55]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[56]  M. Wilhelm,et al.  Gamma/delta T-cell stimulation by pamidronate. , 1999, The New England journal of medicine.

[57]  P. Jänne Challenges of detecting EGFR T790M in gefitinib/erlotinib-resistant tumours. , 2008, Lung cancer.

[58]  Richard D. Smith,et al.  CSAR Benchmark Exercise 2013: Evaluation of Results from a Combined Computational Protein Design, Docking, and Scoring/Ranking Challenge , 2016, J. Chem. Inf. Model..

[59]  Jing Zhang,et al.  Erratum to: The real cost of sequencing: scaling computation to keep pace with data generation , 2016, Genome Biology.

[60]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[61]  Haiyuan Yu,et al.  Interactome INSIDER: a structural interactome browser for genomic studies , 2017, Nature Methods.

[62]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[63]  A. Ullrich,et al.  Strategies to overcome resistance to targeted protein kinase inhibitors , 2004, Nature Reviews Drug Discovery.

[64]  M. Gerstein,et al.  Reads meet rotamers: structural biology in the age of deep sequencing. , 2015, Current opinion in structural biology.

[65]  C. Doss,et al.  Investigating the Structural Impacts of I64T and P311S Mutations in APE1-DNA Complex: A Molecular Dynamics Approach , 2012, PloS one.

[66]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[67]  Urs A Meyer,et al.  Omics and drug response. , 2013, Annual review of pharmacology and toxicology.

[68]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[69]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[70]  R. Purohit,et al.  Molecular Dynamic Simulation Reveals Damaging Impact of RAC1 F28L Mutation in the Switch I Region , 2013, PloS one.

[71]  Andrej Sali,et al.  Integrative Structural Biology , 2013, Science.

[72]  J Andrew McCammon,et al.  Taxodione and arenarone inhibit farnesyl diphosphate synthase by binding to the isopentenyl diphosphate site , 2014, Proceedings of the National Academy of Sciences.

[73]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[74]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[75]  A. Cavalli,et al.  Role of Molecular Dynamics and Related Methods in Drug Discovery. , 2016, Journal of medicinal chemistry.

[76]  R. Russell,et al.  Bisphosphonates: the first 40 years. , 2011, Bone.

[77]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[78]  M. Dolan,et al.  Relating human genetic variation to variation in drug responses. , 2012, Trends in genetics : TIG.

[79]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2007, Current protocols in protein science.

[80]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[81]  Thomas A. Peterson,et al.  Towards precision medicine: advances in computational approaches for the analysis of human variants. , 2013, Journal of molecular biology.

[82]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[83]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.

[84]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[85]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[86]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.