Cascaded multiple classifiers for secondary structure prediction

We describe a new classifier for protein secondary structure prediction that is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack‐knife procedure) on a new nonredundant dataset of 496 nonhomologous sequences (obtained from G.J. Barton and JA. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers that can highly discriminate the three classes (H, E, C) with an accuracy of up to 78% for β‐strands, using only a local window and resampling techniques. This indicates that the importance of long‐range interactions for the prediction of β‐strands has been probably previously overestimated.

[1]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[2]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[3]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[4]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[5]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[6]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[7]  P. Argos,et al.  Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[8]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[9]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[10]  J. M. Levin,et al.  Exploring the limits of nearest neighbour secondary structure prediction. , 1997, Protein engineering.

[11]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[12]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[13]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  B. Robson,et al.  Analysis of the code relating sequence to conformation in proteins: possible implications for the mechanism of formation of helical regions. , 1971, Journal of molecular biology.

[15]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[16]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[17]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[18]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[19]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[20]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[21]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[22]  Thomas E. Creighton,et al.  Protein folding by stages , 1992, Current Biology.

[23]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[24]  S. K. Riis,et al.  Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. , 1996, Journal of computational biology : a journal of computational molecular cell biology.

[25]  K Zimmermann,et al.  In unison: regularization of protein secondary structure predictions that makes use of multiple sequence alignments. , 1998, Protein engineering.

[26]  Stephen Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning , 1992 .

[27]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[28]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[29]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[30]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[31]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[32]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[33]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[34]  V. Lim Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. , 1974, Journal of molecular biology.

[35]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[36]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[37]  T Kawabata,et al.  Improvement of protein secondary structure prediction using binary word encoding , 1997, Proteins.

[38]  C Geourjon,et al.  SOPM: a self-optimized method for protein secondary structure prediction. , 1994, Protein engineering.

[39]  Anders Krogh,et al.  Prediction of Beta Sheets in Proteins , 1995, NIPS.

[40]  R J Fletterick,et al.  Secondary structure assignment for alpha/beta proteins by a combinatorial approach. , 1983, Biochemistry.

[41]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[42]  F. Avbelj,et al.  Role of main-chain electrostatics, hydrophobic effect and side-chain conformational entropy in determining the secondary structure of proteins. , 1998, Journal of molecular biology.

[43]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[44]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[45]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[46]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[47]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[48]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[49]  C Sander,et al.  The role of heat-shock and chaperone proteins in protein folding: possible molecular mechanisms. , 1991, Protein engineering.

[50]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[51]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[52]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[54]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[55]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[56]  Anders Krogh,et al.  Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[57]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[58]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[59]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[60]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[61]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[62]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[63]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[64]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[65]  C. Epstein,et al.  The Genetic Control of Tertiary Protein Structure: Studies With Model Systems , 1963 .

[66]  A. Finkelstein,et al.  Theory of protein secondary structure and algorithm of its prediction , 1983, Biopolymers.

[67]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .