Improving the accuracy of protein secondary structure prediction using structural alignment

BackgroundThe accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.ResultsWe have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%.ConclusionBy using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

[1]  David S. Wishart,et al.  Constrained multiple sequence alignment using XALIGN , 1994, Comput. Appl. Biosci..

[2]  A. Edwards,et al.  Structural proteomics: toward high-throughput structural biology as a tool in functional genomics. , 2003, Accounts of chemical research.

[3]  David S. Wishart,et al.  VADAR: a web server for quantitative evaluation of protein structure quality , 2003, Nucleic Acids Res..

[4]  David S Wishart,et al.  Probing the structural determinants of type II' beta-turn formation in peptides and proteins. , 2002, Journal of the American Chemical Society.

[5]  T. Blundell,et al.  Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. , 1987, Protein engineering.

[6]  Hidetoshi Kono,et al.  Anatomy of specific interactions between λ repressor and operator DNA , 2003 .

[7]  Amanda Clare,et al.  Functional bioinformatics for Arabidopsis thaliana , 2006, Bioinform..

[8]  D. Case,et al.  Use of chemical shifts in macromolecular structure determination. , 2002, Methods in enzymology.

[9]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[10]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[11]  E. Lattman,et al.  Fifth Meeting on the Critical Assessment of Techniques for Protein Structure Prediction , 2022 .

[12]  Liam J McGuffin,et al.  Targeting novel folds for structural genomics , 2002, Proteins.

[13]  Claudio Nicolini,et al.  Expression, purification and characterisation of a novel mutant of the human protein kinase CK2 , 2004, Molecular Biology Reports.

[14]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[15]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[16]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[17]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[18]  Burkhard Rost,et al.  PEP: Predictions for Entire Proteomes , 2003, Nucleic Acids Res..

[19]  Anna Tramontano,et al.  Ten years of predictions … and counting , 2005, The FEBS journal.

[20]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[21]  R B Sim,et al.  β‐Sheet secondary structure of an LDL receptor domain from complement factor I by consensus structure predictions and spectroscopy , 1995, FEBS letters.

[22]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[23]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[24]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[25]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[26]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[27]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[28]  Narayanan Eswar,et al.  MODBASE, a database of annotated comparative protein structure models , 2002, Nucleic Acids Res..

[29]  A. Guzzo,et al.  The influence of amino-acid sequence on protein structure. , 1965, Biophysical journal.

[30]  Dmitrij Frishman,et al.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins , 2004, Nucleic Acids Res..

[31]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[32]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[33]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[34]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[35]  Burkhard Rost,et al.  META-PP: single interface to crucial prediction servers , 2003, Nucleic Acids Res..

[36]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[37]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[38]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[39]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[40]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[41]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[42]  Paul Stothard,et al.  Solution Structure of MTH0776 from Methanobacterium Thermoautotrophicum , 2005, Journal of biomolecular NMR.

[43]  Burkhard Rost,et al.  Static benchmarking of membrane helix predictions , 2003, Nucleic Acids Res..

[44]  Jai-Hoon Kim,et al.  Exploring protein fold space by secondary structure prediction using data distribution method on Grid platform , 2004, Bioinform..

[45]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[46]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[47]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[48]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[49]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[50]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[51]  Duane Szafron,et al.  BASys: a web server for automated bacterial genome annotation , 2005, Nucleic Acids Res..

[52]  Eaton E Lattman,et al.  Seventh Meeting on the Critical Assessment of Techniques for Protein Structure Prediction , 2007, Proteins.

[53]  David S. Wishart,et al.  BacMap: an interactive picture atlas of annotated bacterial genomes , 2004, Nucleic Acids Res..

[54]  David S Wishart,et al.  A simple method to adjust inconsistently referenced 13C and 15N chemical shift assignments of proteins , 2005, Journal of biomolecular NMR.

[55]  D. Wishart,et al.  Peptide rescue of an N‐terminal truncation of the stoffel fragment of Taq DNA polymerase , 1996, Protein science : a publication of the Protein Society.

[56]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[57]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.