Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties

Proteins can move freely in three‐dimensional space. As a result, their structural properties, such as solvent accessible surface area, backbone dihedral angles, and atomic distances, are continuous variables. However, these properties are often arbitrarily divided into a few classes to facilitate prediction by statistical learning techniques. In this work, we establish an integrated system of neural networks (called Real‐SPINE) for real‐value prediction and apply the method to predict residue‐solvent accessibility and backbone ψ dihedral angles of proteins based on information derived from sequences only. Real‐SPINE is trained with a large data set of 2640 protein chains, sequence profiles generated from multiple sequence alignment, representative amino‐acid properties, a slow learning rate, overfitting protection, and predicted secondary structures. The method optimizes more than 200,000 weights and yields a 10‐fold cross‐validated Pearson's correlation coefficient (PCC) of 0.74 between predicted and actual solvent accessible surface areas and 0.62 between predicted and actual ψ angles. In particular, 90% of 2640 proteins have a PCC value greater than 0.6 between predicted and actual solvent‐accessible surface areas. The results of Real‐SPINE can be compared with the best reported correlation coefficients of 0.64–0.67 for solvent‐accessible surface areas and 0.47 for ψ angles. The real‐SPINE server, executable programs, and datasets are freely available on http://sparks.informatics.iupui.edu. Proteins 2007. © 2007 Wiley‐Liss, Inc.

[1]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[2]  Hahn-Ming Lee,et al.  Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression , 2005, Proteins.

[3]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[4]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[7]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[8]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[9]  Yaoqi Zhou,et al.  QBES: Predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization , 2006, Proteins.

[10]  Ron Elber,et al.  SSALN: An alignment algorithm using structure‐dependent substitution matrices and gap penalties learned from structurally aligned protein pairs , 2005, Proteins.

[11]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[12]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[13]  Burkhard Rost,et al.  TOPITS: Threading One-Dimensional Predictions Into Three-Dimensional Structures , 1995, ISMB.

[14]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[15]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[16]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[17]  Burkhard Rost,et al.  Improving fold recognition without folds. , 2004, Journal of molecular biology.

[18]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[19]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[20]  Aleksey Porollo,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:753–767 (2004) Accurate Prediction of Solvent Accessibility Using Neural Networks–Based Regression , 2022 .

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.