The MULTICOM toolbox for protein structure prediction

BackgroundAs genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources.ResultsTo meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction.ConclusionsThese tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

[1]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[2]  R. G. Hart,et al.  Structure of Myoglobin: A Three-Dimensional Fourier Synthesis at 2 Å. Resolution , 1960, Nature.

[3]  David T. Jones,et al.  Rapid protein domain assignment from amino acid sequence using predicted secondary structure , 2002, Protein science : a publication of the Protein Society.

[4]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[5]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[6]  P Fariselli,et al.  Role of evolutionary information in predicting the disulfide‐bonding state of cysteine in proteins , 1999, Proteins.

[7]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[8]  Burkhard Rost,et al.  Evaluation of template‐based models in CASP8 with standard measures , 2009, Proteins.

[9]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.

[10]  Kevin Karplus,et al.  Model quality assessment using distance constraints from alignments , 2009, Proteins.

[11]  Anna Tramontano,et al.  Evaluation of CASP8 model quality predictions , 2009, Proteins.

[12]  Yoichi Muraoka,et al.  Predicting mostly disordered proteins by using structure-unknown protein data , 2007, BMC Bioinform..

[13]  Ralf Zimmer,et al.  SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles , 2006, Bioinform..

[14]  Jian Peng,et al.  Low-homology protein threading , 2010, Bioinform..

[15]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[16]  Alessandro Vullo,et al.  Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins , 2006, BMC Bioinformatics.

[17]  Jianlin Cheng,et al.  An iterative self‐refining and self‐evaluating approach for protein model quality estimation , 2012, Protein science : a publication of the Protein Society.

[18]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[19]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[20]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[21]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[22]  B. Rost,et al.  Sequence-based prediction of protein domains. , 2004, Nucleic acids research.

[23]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[24]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[25]  Aleksey A. Porollo,et al.  Linear Regression Models for Solvent Accessibility Prediction in Proteins , 2005, J. Comput. Biol..

[26]  M. Perutz,et al.  Structure of Hæmoglobin: A Three-Dimensional Fourier Synthesis at 5.5-Å. Resolution, Obtained by X-Ray Analysis , 1960, Nature.

[27]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[28]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[29]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[30]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[31]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[32]  Jianlin Cheng,et al.  HMMEditor: a visual editing tool for profile hidden Markov model , 2008, BMC Genomics.

[33]  Marc S. Cortese,et al.  Flexible nets , 2005, The FEBS journal.

[34]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[35]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[36]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[37]  Yaoqi Zhou,et al.  SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. , 2005, Bioinformatics.

[38]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[39]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[40]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[41]  Yanay Ofran,et al.  Prediction of Protein Structure Through Evolution , 2008 .

[42]  Xin Deng,et al.  PreDisorder: ab initio sequence-based prediction of protein disordered regions , 2009, BMC Bioinformatics.

[43]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[44]  Peter Clote,et al.  DiANNA: a web server for disulfide connectivity prediction , 2005, Nucleic Acids Res..

[45]  Dong Xu,et al.  SeqRate: sequence-based protein folding type classification and rates prediction , 2010, BMC Bioinformatics.

[46]  Ashley Deacon,et al.  Structural genomics: from genes to structures with valuable materials and many questions in between , 2008, Nature Methods.

[47]  Pascal Benkert,et al.  QMEAN: A comprehensive scoring function for model quality assessment , 2008, Proteins.

[48]  Anna Tramontano,et al.  Evaluation of model quality predictions in CASP9 , 2011, Proteins.

[49]  Jinyan Li,et al.  DomSVR: domain boundary prediction with support vector regression from sequence information alone , 2010, Amino Acids.

[50]  M. Michael Gromiha,et al.  CUPSAT: prediction of protein stability upon point mutations , 2006, Nucleic Acids Res..

[51]  Torsten Schwede,et al.  Assessment of template based protein structure predictions in CASP9 , 2011, Proteins.

[52]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[53]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Xin Deng,et al.  MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts , 2011, BMC Bioinformatics.

[55]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[56]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[57]  Matthew P Jacobson,et al.  Assessment of protein structure refinement in CASP9 , 2011, Proteins.

[58]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[59]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[60]  Siddharth Singh,et al.  SVM-BetaPred: Prediction of Right-Handed ß-Helix Fold from Protein Sequence Using SVM , 2007, PRIB.

[61]  Jooyoung Lee,et al.  PPRODO: Prediction of protein domain boundaries using neural networks , 2005, Proteins.

[62]  Yuxing Liao,et al.  CASP9 assessment of free modeling target predictions , 2011, Proteins.

[63]  Arne Elofsson,et al.  Pcons.net: protein structure prediction meta server , 2007, Nucleic Acids Res..

[64]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[65]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[66]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[67]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[68]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[69]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[70]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[71]  J. Skolnick,et al.  Ab initio protein structure prediction using chunk-TASSER. , 2007, Biophysical journal.

[72]  Liam J. McGuffin,et al.  The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction , 2011, Nucleic Acids Res..

[73]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[74]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[75]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[76]  Todd O. Yeates,et al.  GDAP: a web tool for genome-wide protein disulfide bond prediction , 2004, Nucleic Acids Res..

[77]  Aleksey A. Porollo,et al.  Maximum Feasibility Approach for Consensus Classifiers : Applications to Protein Structure Prediction , 2004 .

[78]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[79]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[80]  Liam J. McGuffin,et al.  Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments , 2010, Bioinform..

[81]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[82]  Ulrich H. E. Hansmann,et al.  BETTY: Prediction of beta;-Strand Type from Sequence , 2007, Silico Biol..

[83]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[84]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[85]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[86]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[87]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..

[88]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[89]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[90]  Richard D. Smith,et al.  Identification of intrinsic order and disorder in the DNA repair protein XPA , 2001, Protein science : a publication of the Protein Society.

[91]  Jilong Li,et al.  Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines , 2013, Journal of proteomics & bioinformatics.

[92]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[93]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[94]  M. Perutz,et al.  Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. , 1960, Nature.

[95]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[96]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[97]  Yutaka Kuroda,et al.  POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions , 2007, Bioinform..

[98]  Alfonso Valencia,et al.  Assessment of intramolecular contact predictions for CASP7 , 2007, Proteins.

[99]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[100]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[101]  Silvio C. E. Tosatto,et al.  Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines , 2006, Nucleic Acids Res..

[102]  A. Szilágyi,et al.  Improving protein structure prediction using multiple sequence-based contact predictions. , 2011, Structure.

[103]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[104]  D Gilis,et al.  PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. , 2000, Protein engineering.

[105]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[106]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[107]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[108]  Osvaldo Graña,et al.  Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 , 2009, Proteins.

[109]  Shuichi Hirose,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm330 Structural bioinformatics , 2022 .

[110]  Torgeir R. Hvidsten,et al.  Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts , 2009, Bioinform..

[111]  Jianwen Fang,et al.  Predicting residue-residue contacts using random forest models , 2011, Bioinform..

[112]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[113]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[114]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[115]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[116]  Xin Deng,et al.  DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning , 2011, BMC Bioinformatics.

[117]  B. Berger,et al.  betawrap: Successful prediction of parallel β-helices from primary sequence reveals an association with many microbial pathogens , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[119]  Pierre Baldi,et al.  Large-Scale Prediction of Disulphide Bond Connectivity , 2004, NIPS.