Computational Tools for the Structural Characterization of Proteins and Their Complexes from Sequence-Evolutionary Data

Structural characterization of proteins and their complexes is a fundamental part in understanding any biological phenomena. Yet, the experimental determination of the three‐dimensional (3D) structure of proteins and their complexes remains a challenging undertaking. In order to complement the experimental approaches, computational methods have been developed based on a variety of algorithms and models to fill the gap between the amount of sequences and structures. In this article, we review the most common methodological approaches currently used in the field, highlighting the ab initio structure prediction methods and methods for the prediction and structural modeling of protein–protein interfaces (PPIs). We particularly focus on the use of evolutionary information to guide the modeling process.

[1]  A. Ben-Hur,et al.  PAIRpred: Partner‐specific prediction of interacting residues from sequence and structure , 2014, Proteins.

[2]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[3]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[4]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[5]  Barry Robson,et al.  Protein structure prediction , 1993, Nature.

[6]  Marcin J. Skwark,et al.  Membrane protein contact and structure prediction using co-evolution in conjunction with machine learning , 2017, PloS one.

[7]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[8]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[9]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[10]  Sriram Subramaniam,et al.  Cryo‐electron microscopy – a primer for the non‐microscopist , 2013, The FEBS journal.

[11]  W R Taylor,et al.  Three-dimensional domain duplication, swapping and stealing. , 1997, Current opinion in structural biology.

[12]  Dominique P Frueh,et al.  Practical aspects of NMR signal assignment in larger and challenging proteins. , 2014, Progress in nuclear magnetic resonance spectroscopy.

[13]  Bogdan Istrate,et al.  Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor , 2014, BMC Bioinformatics.

[14]  Vasant Honavar,et al.  Characterization of Protein–Protein Interfaces , 2008, The protein journal.

[15]  J. Rodrigues,et al.  Integrative computational modeling of protein interactions , 2014, The FEBS journal.

[16]  Oliver F. Lange,et al.  Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction , 2015, PLoS Comput. Biol..

[17]  K. V. van Wijk,et al.  Consequences of Membrane Protein Overexpression in Escherichia coli*S , 2007, Molecular & Cellular Proteomics.

[18]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[19]  J Garnier,et al.  Protein structure prediction. , 1990, Biochimie.

[20]  M. Gromiha,et al.  Integrating computational methods and experimental data for understanding the recognition mechanism and binding affinity of protein-protein complexes. , 2017, Progress in biophysics and molecular biology.

[21]  Ozlem Keskin,et al.  Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces , 2005, Bioinform..

[22]  Charles R Sanders,et al.  The quiet renaissance of protein nuclear magnetic resonance. , 2013, Biochemistry.

[23]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[24]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[25]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[26]  C. Dominguez,et al.  HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. , 2003, Journal of the American Chemical Society.

[27]  K. Ito,et al.  Impact of post-translational modifications of proteins on the inflammatory process. , 2007, Biochemical Society transactions.

[28]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[29]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[30]  Vasant Honavar,et al.  HomPPI: a class of sequence homology based protein-protein interface prediction methods , 2011, BMC Bioinformatics.

[31]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[32]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[33]  Mindy I. Davis,et al.  Breaking Cryo-EM Resolution Barriers to Facilitate Drug Discovery , 2016, Cell.

[34]  Alessandra Carbone,et al.  Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling , 2009, PLoS Comput. Biol..

[35]  Huan-Xiang Zhou,et al.  meta-PPISP: a meta web server for protein-protein interaction site prediction , 2007, Bioinform..

[36]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[37]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[38]  D. Shortle,et al.  Prediction of protein structure , 2000, Current Biology.

[39]  Zsuzsanna Dosztányi,et al.  TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates , 2005, Bioinform..

[40]  Chaok Seok,et al.  High-resolution protein-protein docking by global optimization: recent advances and future challenges. , 2015, Current opinion in structural biology.

[41]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[42]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[43]  M. V. Katti,et al.  Amino acid repeat patterns in protein sequences: Their diversity and structural‐functional implications , 2000, Protein science : a publication of the Protein Society.

[44]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[45]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[46]  Yang Zhang,et al.  Protein-protein complex structure predictions by multimeric threading and template recombination. , 2011, Structure.

[47]  J. Heringa,et al.  An overview of multiple sequence alignment. , 2003, Current protocols in bioinformatics.

[48]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[49]  Robert M. Stroud,et al.  Mechanism of Ammonia Transport by Amt/MEP/Rh: Structure of AmtB at 1.35 Å , 2004, Science.

[50]  Jens Meiler,et al.  BCL::Fold - De Novo Prediction of Complex and Large Protein Topologies by Assembly of Secondary Structure Elements , 2012, PloS one.

[51]  Piero Fariselli,et al.  Is There an Optimal Substitution Matrix for Contact Prediction with Correlated Mutations? , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Jianzhu Ma,et al.  RaptorX server: a resource for template-based protein structure modeling. , 2014, Methods in molecular biology.

[53]  A. Bordogna,et al.  Defining the limits of homology modeling in information‐driven protein docking , 2013, Proteins.

[54]  Alexandre M J J Bonvin,et al.  Membrane proteins structures: A review on computational modeling tools. , 2017, Biochimica et biophysica acta. Biomembranes.

[55]  Aleksey A. Porollo,et al.  CoeViz: a web-based tool for coevolution analysis of protein residues , 2016, BMC Bioinformatics.

[56]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[57]  Gira Bhabha,et al.  Architectures of Lipid Transport Systems for the Bacterial Outer Membrane , 2017, Cell.

[58]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[59]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[60]  Patrick Argos,et al.  [10] Prediction of protein structure , 1986 .

[61]  Itay Mayrose,et al.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules , 2016, Nucleic Acids Res..

[62]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Edrisse Chermak,et al.  Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps , 2015, Molecules.

[64]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[65]  Alexandre M. J. J. Bonvin,et al.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK , 2011, PloS one.

[66]  Jens Meiler,et al.  Simultaneous prediction of protein secondary structure and transmembrane spans , 2013, Proteins.

[67]  K Wüthrich,et al.  NMR spectroscopy of large molecules and multimolecular assemblies in solution. , 1999, Current opinion in structural biology.

[68]  Jens Meiler,et al.  Accurate Prediction of Contact Numbers for Multi-Spanning Helical Membrane Proteins , 2016, J. Chem. Inf. Model..

[69]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[70]  Ilya A Vakser,et al.  Protein-protein docking: from interaction to interactome. , 2014, Biophysical journal.

[71]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[72]  R. Nussinov,et al.  Exploiting conformational ensembles in modeling protein-protein interactions on the proteome scale. , 2013, Journal of proteome research.

[73]  J. Deisenhofer Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. , 1981, Biochemistry.

[74]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[75]  Veronica Morea,et al.  Protein structure prediction. , 2008, Methods in molecular biology.

[76]  A. Bonvin,et al.  The HADDOCK web server for data-driven biomolecular docking , 2010, Nature Protocols.

[77]  Loris Nanni,et al.  Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[78]  Ozlem Keskin,et al.  HotPoint: hot spot prediction server for protein interfaces , 2010, Nucleic Acids Res..

[79]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[80]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[81]  Ruben Abagyan,et al.  FRODOCK: a new approach for fast rotational protein-protein docking , 2009, Bioinform..

[82]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[83]  Mark P Foster,et al.  Solution NMR of large molecules and assemblies. , 2007, Biochemistry.

[84]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[85]  Timothy Nugent,et al.  De novo membrane protein structure prediction. , 2015, Methods in molecular biology.

[86]  Dan Li,et al.  Recent Advances in Protein-Protein Docking. , 2016, Current drug targets.

[87]  Ernest B. Campbell,et al.  Structure of a CLC chloride ion channel by cryo-electron microscopy , 2016, Nature.

[88]  Frank Alber,et al.  Integrative modelling of cellular assemblies. , 2017, Current opinion in structural biology.

[89]  Pedro Alexandrino Fernandes,et al.  Protein–protein docking dealing with the unknown , 2009, J. Comput. Chem..

[90]  V. Lučić,et al.  Cryo-electron tomography: The challenge of doing structural biology in situ , 2013, The Journal of cell biology.

[91]  Guilhem Faure,et al.  InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution , 2013, Bioinform..

[92]  S A Benner,et al.  Protein structure prediction. , 1996, Science.

[93]  I. Xenarios,et al.  UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. , 2016, Methods in molecular biology.

[94]  Raquel Norel,et al.  Protein interface conservation across structure space , 2010, Proceedings of the National Academy of Sciences.

[95]  Chuong B Do,et al.  Protein multiple sequence alignment. , 2008, Methods in molecular biology.

[96]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[97]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[98]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[99]  G C P van Zundert,et al.  The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. , 2016, Journal of molecular biology.

[100]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[101]  Pierre Tufféry,et al.  InterEvDock: a docking server to predict the structure of protein–protein interactions using evolutionary information , 2016, Nucleic Acids Res..

[102]  Y J Edwards,et al.  Prediction of protein structure and function by using bioinformatics. , 2001, Methods in molecular biology.

[103]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[104]  Zhengwei Zhu,et al.  Templates are available to model nearly all complexes of structurally characterized proteins , 2012, Proceedings of the National Academy of Sciences.

[105]  Vittorio Scarano,et al.  COCOMAPS: a web application to analyze and visualize contacts at the interface of biomolecular complexes , 2011, Bioinform..

[106]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[107]  Ruben Abagyan,et al.  REVCOM: a robust Bayesian method for evolutionary rate estimation , 2005, Bioinform..

[108]  Sven Hovmöller,et al.  Prediction of Protein Structure , 2004, Numerical Computer Methods, Part D.

[109]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[110]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[111]  Andrej Sali,et al.  Optimized atomic statistical potentials: assessment of protein interfaces and loops , 2013, Bioinform..

[112]  T. Schwede,et al.  Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology , 2017, Scientific Reports.

[113]  Christoph Göbl,et al.  Prediction of Protein Structure Using Surface Accessibility Data , 2016, Angewandte Chemie.

[114]  Haruki Nakamura,et al.  Consistent Molecular Dynamics Scheme Applying the Wolf Summation for Calculating Electrostatic Interaction of Particles(Atomic and molecular physics) , 2008 .

[115]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[116]  Anatoliy Markiv,et al.  Beyond the genome and proteome: targeting protein modifications in cancer. , 2012, Current opinion in pharmacology.

[117]  Yang Zhang,et al.  Template-based structure modeling of protein-protein interactions. , 2014, Current opinion in structural biology.

[118]  Ozlem Keskin,et al.  HotRegion: a database of predicted hot spot clusters , 2011, Nucleic Acids Res..

[119]  Parviz Abdolmaleki,et al.  Predictions of Protein-Protein Interfaces within Membrane Protein Complexes , 2013, Avicenna journal of medical biotechnology.

[120]  P. Alexander,et al.  A minimal sequence code for switching protein structure and function , 2009, Proceedings of the National Academy of Sciences.

[121]  Samuel L. DeLuca,et al.  Practically Useful: What the Rosetta Protein Modeling Suite Can Do for You , 2010, Biochemistry.

[122]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[123]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[124]  D. Mount,et al.  Comparison of the PAM and BLOSUM Amino Acid Substitution Matrices. , 2008, CSH protocols.

[125]  Keehyoung Joo,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[126]  Qin Ouyang,et al.  Homology modeling, docking, and molecular dynamics simulation of the receptor GALR2 and its interactions with galanin and a positive allosteric modulator , 2016, Journal of Molecular Modeling.

[127]  C. Ponting,et al.  Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. , 2007, Genome research.

[128]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[129]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[130]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[131]  Yang Zhang,et al.  NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers , 2017, Bioinform..

[132]  Pedro Alexandrino Fernandes,et al.  Are hot-spots occluded from water? , 2014, Journal of biomolecular structure & dynamics.

[133]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[134]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[135]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[136]  J M Thornton,et al.  Protein structure prediction. , 1998, Current opinion in biotechnology.

[137]  Jean-Claude Paul,et al.  Intersurf: dynamic interface between proteins. , 2005, Journal of molecular graphics & modelling.

[138]  Andrew J. Bordner,et al.  Predicting protein-protein binding sites in membrane proteins , 2009, BMC Bioinformatics.

[139]  Andrej Sali,et al.  Uncertainty in integrative structural modeling. , 2014, Current opinion in structural biology.

[140]  T. Ko,et al.  The crystal structure of the DNase domain of colicin E7 in complex with its inhibitor Im7 protein. , 1999, Structure.

[141]  E. Vallender,et al.  Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. , 2009, Methods.

[142]  D. Julius,et al.  Structure of the TRPV1 ion channel determined by electron cryo-microscopy , 2013, Nature.

[143]  Robert B. Russell,et al.  Protein structure prediction , 1993, Nature.

[144]  Alexandre M J J Bonvin,et al.  SpotOn: High Accuracy Identification of Protein-Protein Interface Hot-Spots , 2017, Scientific Reports.

[145]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[146]  Vasant Honavar,et al.  Template-based protein–protein docking exploiting pairwise interfacial residue restraints , 2016, Briefings Bioinform..

[147]  John D. Westbrook,et al.  EMDataBank unified data resource for 3DEM , 2013, Nucleic Acids Res..