DynBench3D, a Web-Resource to Dynamically Generate Benchmark Sets of Large Heteromeric Protein Complexes.

Multi-protein machines are responsible for most cellular tasks, and many efforts have been invested in the systematic identification and characterization of thousands of these macromolecular assemblies. However, unfortunately, the (quasi) atomic details necessary to understand their function are available only for a tiny fraction of the known complexes. The computational biology community is developing strategies to integrate structural data of different nature, from electron microscopy to X-ray crystallography, to model large molecular machines, as it has been done for individual proteins and interactions with remarkable success. However, unlike for binary interactions, there is no reliable gold-standard set of three-dimensional (3D) complexes to benchmark the performance of these methodologies and detect their limitations. Here, we present a strategy to dynamically generate non-redundant sets of 3D heteromeric complexes with three or more components. By changing the values of sequence identity and component overlap between assemblies required to define complex redundancy, we can create sets of representative complexes with known 3D structure (i.e., target complexes). Using an identity threshold of 20% and imposing a fraction of component overlap of <0.5, we identify 495 unique target complexes, which represent a real non-redundant set of heteromeric assemblies with known 3D structure. Moreover, for each target complex, we also identify a set of assemblies, of varying degrees of identity and component overlap, that can be readily used as input in a complex modeling exercise (i.e., template subcomplexes). We hope that resources like this will significantly help the development and progress assessment of novel methodologies, as docking benchmarks and blind prediction contests did. The interactive resource is accessible at https://DynBench3D.irbbarcelona.org.

[1]  P. Uetz,et al.  The binary protein-protein interaction landscape of Escherichia coli , 2014, Nature Biotechnology.

[2]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[3]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[4]  Jofre Tenorio-Laranga,et al.  dSysMap: exploring the edgetic role of disease mutations , 2015, Nature Methods.

[5]  Yang Zhang,et al.  Template-based structure modeling of protein-protein interactions. , 2014, Current opinion in structural biology.

[6]  B. Alberts The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists , 1998, Cell.

[7]  Arnaud Céol,et al.  3did: a catalog of domain-based interactions of known three-dimensional structure , 2013, Nucleic Acids Res..

[8]  Zhiping Weng,et al.  A protein–protein docking benchmark , 2003, Proteins.

[9]  Qifang Xu,et al.  The protein common interface database (ProtCID)—a comprehensive database of interactions of homologous proteins in multiple crystal forms , 2010, Nucleic Acids Res..

[10]  Abhik Mukhopadhyay,et al.  PDBe: towards reusable data delivery infrastructure at protein data bank in Europe , 2017, Nucleic Acids Res..

[11]  Alexandre M J J Bonvin,et al.  M3: an integrative framework for structure determination of molecular machines , 2017, Nature Methods.

[12]  D. Baker,et al.  Accurate computational design of multipass transmembrane proteins , 2018, Science.

[13]  Taras Dauzhenka,et al.  Dockground: A comprehensive data resource for modeling of protein complexes , 2018, Protein science : a publication of the Protein Society.

[14]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[15]  Sarah A. Teichmann,et al.  3D Complex: A Structural Classification of Protein Complexes , 2006, PLoS Comput. Biol..

[16]  Jiye Shi,et al.  SAbDab: the structural antibody database , 2013, Nucleic Acids Res..

[17]  Andrej Sali,et al.  Integrative Structural Biology , 2013, Science.

[18]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[19]  Michael Schroeder,et al.  SCOPPI: a structural classification of protein–protein interfaces , 2005, Nucleic Acids Res..

[20]  S. Teichmann,et al.  Assembly reflects evolution of protein complexes , 2008, Nature.

[21]  M. Biasini,et al.  OpenStructure: an integrated software framework for computational structural biology , 2013, Acta crystallographica. Section D, Biological crystallography.

[22]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[23]  S. Teichmann,et al.  Principles of assembly reveal a periodic table of protein complexes , 2015, Science.

[24]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[25]  Ruedi Aebersold,et al.  Mass-spectrometric exploration of proteome structure and function , 2016, Nature.

[26]  Peer Bork,et al.  A complex prediction: three‐dimensional model of the yeast exosome , 2002, EMBO reports.

[27]  C. Deane,et al.  Antibody H3 Structure Prediction , 2017, Computational and structural biotechnology journal.

[28]  Andreas Prlic,et al.  MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures , 2017, PLoS Comput. Biol..

[29]  Sameer Velankar,et al.  The challenge of modeling protein assemblies: the CASP12‐CAPRI experiment , 2018, Proteins.

[30]  E. Nogales The development of cryo-EM into a mainstream structural biology technique , 2015, Nature Methods.

[31]  H. Wolfson,et al.  Prediction of multimolecular assemblies by multiple docking. , 2005, Journal of molecular biology.

[32]  Mauricio Carrillo-Tripp,et al.  VIPERdb2: an enhanced and web API enabled relational database for structural virology , 2008, Nucleic Acids Res..

[33]  Friedrich Förster,et al.  Structural characterization of the interaction of Ubp6 with the 26S proteasome , 2015, Proceedings of the National Academy of Sciences.

[34]  R. Russell,et al.  The relationship between sequence and interaction divergence in proteins. , 2003, Journal of molecular biology.

[35]  Sameer Velankar,et al.  PDBe: Protein Data Bank in Europe , 2010, Nucleic Acids Res..

[36]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[37]  Patrick Aloy,et al.  Ten thousand interactions for the molecular biologist , 2004, Nature Biotechnology.

[38]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[39]  Helen M Berman,et al.  Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. , 2018, Structure.

[40]  Torsten Schwede,et al.  Assessment of protein assembly prediction in CASP12 , 2018, Proteins.