Interactome INSIDER: a multi-scale structural interactome browser for genomic studies

Protein interactions underlie nearly all known cellular function, making knowledge of their binding conformations paramount to understanding the physical workings of the cell. Studying binding conformations has allowed scientists to explore some of the mechanistic underpinnings of disease caused by disruption of protein interactions. However, since experimentally determined interaction structures are only available for a small fraction of the known interactome such inquiry has largely excluded functional genomic studies of the human interactome and broad observations of the inner workings of disease. Here we present Interactome INSIDER, an information center for genomic studies using the first full-interactome map of human interaction interfaces. We applied a new, unified framework to predict protein interaction interfaces for 184,605 protein interactions with previously unresolved interfaces in human and 7 model organisms, including the entire experimentally determined human binary interactome. We find that predicted interfaces share several known functional properties of interfaces, including an enrichment for disease mutations and recurrent cancer mutations, suggesting their applicability to functional genomic studies. We also performed 2,164 de novo mutagenesis experiments and show that mutations of predicted interface residues disrupt interactions at a similar rate to known interface residues and at a much higher rate than mutations outside of predicted interfaces. To spur functional genomic studies in the human interactome, Interactome INSIDER (http://interactomeinsider.yulab.org) allows users to explore known population variants, disease mutations, and somatic cancer mutations, or upload their own set of mutations to find enrichment at the level of protein domains, residues, and 3D atomic clustering in known and predicted interaction interfaces.

[1]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[2]  Matthew Mort,et al.  mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome , 2016, Human mutation.

[3]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[4]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[5]  Sourav Bandyopadhyay,et al.  Challenges in identifying cancer genes by analysis of exome sequencing data , 2016, Nature Communications.

[6]  H. Ellegren Comparative genomics and the study of evolution by natural selection , 2008, Molecular ecology.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  R. Haydon,et al.  Bone Morphogenetic Protein (BMP) signaling in development and human diseases , 2014, Genes & diseases.

[9]  Jofre Tenorio-Laranga,et al.  dSysMap: exploring the edgetic role of disease mutations , 2015, Nature Methods.

[10]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[11]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[12]  Benjamin J. Raphael,et al.  Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine , 2014, Genome Medicine.

[13]  Prediction of homo- and hetero-protein complexes by protein docking and template-based modeling: a CASP-CAPRI experiment , 2016 .

[14]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[15]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[16]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[17]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[18]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[19]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[20]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[21]  Anton I. Petrov,et al.  WebFR3D—a server for finding, aligning and analyzing recurrent RNA 3D motifs , 2011, Nucleic Acids Res..

[22]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[23]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[24]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[25]  A. Panchenko,et al.  Predicting the Impact of Missense Mutations on Protein–Protein Binding Affinity , 2014, Journal of chemical theory and computation.

[26]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[27]  W. Kühlbrandt,et al.  Cryo-EM enters a new era , 2014, eLife.

[28]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[29]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[30]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[31]  A. Sivachenko,et al.  A Landscape of Driver Mutations in Melanoma , 2012, Cell.

[32]  I. Nonaka,et al.  Muscle disease caused by mutations in the skeletal muscle alpha-actin gene (ACTA1) , 2003, Neuromuscular Disorders.

[33]  Matthew Mort,et al.  A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations , 2014, PLoS genetics.

[34]  Daisuke Kihara,et al.  Prediction of homoprotein and heteroprotein complexes by protein docking and template‐based modeling: A CASP‐CAPRI experiment , 2016, Proteins.

[35]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[36]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[37]  Tina Ritschel,et al.  Current progress in Structure-Based Rational Drug Design marks a new mindset in drug discovery , 2018 .

[38]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[39]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[40]  Zhiping Weng,et al.  Accelerating Protein Docking in ZDOCK Using an Advanced 3D Convolution Library , 2011, PloS one.

[41]  Alexandre M. J. J. Bonvin,et al.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK , 2011, PloS one.

[42]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[43]  Fabian J. Theis,et al.  MIPS: curated databases and comprehensive secondary data resources in 2010 , 2010, Nucleic Acids Res..

[44]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[45]  Ian M. Donaldson,et al.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence , 2010, Database J. Biol. Databases Curation.

[46]  Zoran Obradovic,et al.  Exploring bias in the Protein Data Bank using contrast classifiers. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[47]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[48]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[49]  Z. Weng,et al.  Binding interface prediction by combining protein–protein docking results , 2014, Proteins.

[50]  Vasant Honavar,et al.  Predicting protein-protein interface residues using local surface structural similarity , 2012, BMC Bioinformatics.

[51]  Zhen Zhang,et al.  Systems biology of the structural proteome , 2016, BMC Systems Biology.

[52]  Haiyuan Yu,et al.  HINT: High-quality protein interactomes and their applications in understanding human disease , 2012, BMC Systems Biology.

[53]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[54]  Gary A. Churchill,et al.  The future of model organisms in human disease research , 2011, Nature Reviews Genetics.

[55]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Xiaowei Yang,et al.  Towards Structural Systems Pharmacology to Study Complex Diseases and Personalized Medicine , 2014, PLoS Comput. Biol..

[58]  M. Vidal,et al.  Selecting causal genes from genome-wide association studies via functionally coherent subnetworks , 2014, Nature Methods.

[59]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[60]  B. Maron Hypertrophic cardiomyopathy: a systematic review. , 2002, JAMA.

[61]  Ilya A Vakser,et al.  Low-resolution structural modeling of protein interactome. , 2013, Current opinion in structural biology.

[62]  Tugba G. Kucukkal,et al.  Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. , 2015, Current opinion in structural biology.

[63]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[64]  Minoru Yoshida,et al.  A Proteome-wide Fission Yeast Interactome Reveals Network Evolution Principles from Yeasts to Human , 2016, Cell.

[65]  M. Sternberg,et al.  Protein–protein interaction sites are hot spots for disease‐associated nonsynonymous SNPs , 2012, Human mutation.

[66]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[67]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[68]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[69]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[70]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[71]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[72]  Debora S. Marks,et al.  Quantification of the effect of mutations using a global probability model of natural sequence variation , 2015, 1510.04612.

[73]  Philip M. Kim,et al.  Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights , 2006, Science.

[74]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[75]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[76]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[77]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[78]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[79]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[80]  E. Lander,et al.  Comprehensive assessment of cancer missense mutation clustering in protein structures , 2015, Proceedings of the National Academy of Sciences.

[81]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[82]  István A. Kovács,et al.  Widespread Macromolecular Interaction Perturbations in Human Genetic Disorders , 2015, Cell.

[83]  G.P.337 Cardiomyopathy in patients with ACTA1-myopathy , 2015, Neuromuscular Disorders.

[84]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[85]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[86]  Lei Deng,et al.  A computational interactome and functional annotation for the human proteome , 2016, eLife.

[87]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[88]  Vassilios Ioannidis,et al.  ExPASy: SIB bioinformatics resource portal , 2012, Nucleic Acids Res..

[89]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[90]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[91]  Haiyuan Yu,et al.  Three-dimensional reconstruction of protein networks provides insight into human genetic disease , 2012, Nature Biotechnology.

[92]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[93]  L. Aaltonen,et al.  SMAD genes in juvenile polyposis , 1999, Genes, chromosomes & cancer.

[94]  Ruben Abagyan,et al.  PIER: Protein interface recognition for structural proteomics , 2007, Proteins.

[95]  Fan Yang,et al.  Protein Domain-Level Landscape of Cancer-Type-Specific Somatic Mutations , 2015, PLoS Comput. Biol..

[96]  L. Tucker-Kellogg,et al.  Exome Sequencing Reveals Germline SMAD9 Mutation That Reduces Phosphatase and Tensin Homolog Expression and Is Associated With Hamartomatous Polyposis and Gastrointestinal Ganglioneuromas. , 2015, Gastroenterology.

[97]  Haiyuan Yu,et al.  Exploring mechanisms of human disease through structurally resolved protein interactome networks. , 2014, Molecular bioSystems.

[98]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[99]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[100]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[101]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[102]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.