Learning protein secondary structure from sequential and relational data

We propose a method for sequential supervised learning that exploits explicit knowledge of short- and long-range dependencies. The architecture consists of a recursive and bi-directional neural network that takes as input a sequence along with an associated interaction graph. The interaction graph models (partial) knowledge about long-range dependency relations. We tested the method on the prediction of protein secondary structure, a task in which relations due to beta-strand pairings and other spatial proximities are known to have a significant effect on the prediction accuracy. In this particular task, interactions can be derived from knowledge of protein contact maps at the residue level. Our results show that prediction accuracy can be significantly boosted by the integration of interaction graphs.

[1]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[2]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  John Hawkins,et al.  The applicability of recurrent neural networks for biological sequence analysis , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[8]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[10]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[11]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[12]  Pierre Baldi,et al.  Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners , 2002, ISMB.

[13]  Paolo Frasconi,et al.  Prediction of Protein Topologies Using GIOHMMs and GRNNs , 2003, NIPS 2003.

[14]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[15]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[16]  S. K. Riis,et al.  Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. , 1996, Journal of computational biology : a journal of computational molecular cell biology.

[17]  P. Frasconi,et al.  Learning first-pass structural attachment preferences with dynamic grammars and recursive neural networks , 2003, Cognition.

[18]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[19]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[20]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[21]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[22]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[23]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[24]  Lubert Stryer,et al.  Biochemistry 5th ed , 2002 .

[25]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[27]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[28]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[29]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[30]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[33]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[34]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[35]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[36]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[37]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[38]  Alessio Micheli,et al.  Application of Cascade Correlation Networks for Structures to Chemistry , 2004, Applied Intelligence.

[39]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[40]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[41]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[42]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[43]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[44]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[45]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[46]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.