Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks

BackgroundPredicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles.ResultsIn this paper we develop an artificial neural network that uses an input-window of amino acids to predict a dihedral angle probability distribution for the middle residue in the input-window. The trained neural network shows a significant improvement (4-68%) in predicting the most probable bin (covering a 30° × 30° area of the dihedral angle space) for all amino acids in the data set compared to baseline statistics. An accuracy comparable to that of secondary structure prediction (≈ 80%) is achieved by observing the 20 bins with highest output values.ConclusionMany different protein structure prediction methods exist and each uses different tools and auxiliary predictions to help determine the native structure. In this work the sequence is used to predict local context dependent dihedral angle propensities in coil-regions. This predicted distribution can potentially improve tertiary structure prediction methods that are based on sampling the backbone dihedral angles of individual amino acids. The predicted distribution may also help predict local structure fragments used in fragment assembly methods.

[1]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[2]  B. L. Sibanda,et al.  Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. , 1989, Journal of molecular biology.

[3]  George D Rose,et al.  Structures, basins, and energies: A deconstruction of the Protein Coil Library , 2008, Protein science : a publication of the Protein Society.

[4]  Baldomero Oliva,et al.  A supersecondary structure library and search algorithm for modeling loops in protein structures , 2006, Nucleic acids research.

[5]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Pawel Winter,et al.  Protein Decoy Generation Using Branch and Bound with Efficient Bounding , 2008, WABI.

[7]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[8]  J. Thornton,et al.  Helix geometry in proteins. , 1988, Journal of molecular biology.

[9]  Alexandre G. de Brevern,et al.  Use of a structural alphabet for analysis of short loops connecting repetitive structures , 2004, BMC Bioinformatics.

[10]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[11]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[12]  Wei Chu,et al.  A graphical model for protein secondary structure prediction , 2004, ICML.

[13]  D. Eisenberg Proteins. Structures and molecular properties, T.E. Creighton. W. H. Freeman and Company, New York (1984), 515, $36.95 , 1985 .

[14]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[15]  Jesper Ferkinghoff-Borg,et al.  A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[16]  Deniz Yuret,et al.  Relationships between amino acid sequence and backbone torsion angle preferences , 2004, Proteins.

[17]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[18]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[19]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[20]  Kevin Karplus,et al.  PREDICT-2ND: a tool for generalized protein local structure prediction , 2008, Bioinform..

[21]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[22]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[23]  Thomas Lengauer,et al.  BMC Bioinformatics Methodology article Local protein structure prediction using discriminative models , 2006 .

[24]  Ulrich H. E. Hansmann,et al.  Bioinformatics Original Paper Support Vector Machines for Prediction of Dihedral Angle Regions , 2022 .

[25]  Shankar Subramaniam,et al.  Protein local structure prediction from sequence , 2003, Proteins.

[26]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[27]  Jaime Prilusky,et al.  Assessment of CASP8 structure predictions for template free targets , 2009, Proteins.

[28]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.