Matching Protein b-Sheet Partners by Feedforward and Recurrent Neural Networks

Predicting the secondary structure (a-helices, j3sheets, coils) of proteins is an important step towards understanding their three dimensional conformations. Unlike a-helices that are built up from one contiguous region of the polypeptide chain, r-sheets are more complex resulting from a combination of two or more disjoint regions. The exact nature of these long distance interactions remains unclear. Here we introduce two neural-network based methods for the prediction of amino acid partners in parallel as well as antiparallel j3-sheets. The neural architectures predict whether two residues located at the center of two distant windows are paired or not in a r-sheet structure. Variations on these architecture, including also profiles and ensembles, are trained and tested via five-fold cross validation using a large corpus of curated data. Prediction on both coupled and non-coupled residues currently approaches 84% accuracy, better than any previously reported method.

[1]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[2]  S L Mayo,et al.  Intrinsic beta-sheet propensities result from van der Waals interactions between side chains and the local backbone. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  W. Braun,et al.  Sequence specificity, statistical potentials, and three‐dimensional structure prediction with self‐correcting distance geometry calculations of β‐sheet formation in proteins , 2008 .

[4]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[5]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[8]  R. Abagyan,et al.  Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[9]  M. A. Wouters,et al.  An analysis of side chain interactions and pair correlations within antiparallel β‐sheets: The differences between backbone hydrogen‐bonded and non‐hydrogen‐bonded residue pairs , 1995, Proteins.

[10]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[11]  Anders Krogh,et al.  Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[12]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[13]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[14]  L. Pauling,et al.  Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds: Two New Pleated Sheets. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[15]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.