Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure

Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.

[1]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  R. L. Jernigan,et al.  Fast learning optimized prediction methodology (FLOPRED) for protein secondary structure prediction , 2012, Journal of Molecular Modeling.

[4]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[5]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[6]  Changiz Eslahchi,et al.  Protein secondary structure prediction using three neural networks and a segmental semi Markov model. , 2009, Mathematical biosciences.

[7]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[8]  D. Kihara The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.

[9]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[10]  Sundaram Suresh,et al.  Performance enhancement of extreme learning machine for multi-category sparse data classification problems , 2010, Eng. Appl. Artif. Intell..

[11]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[12]  Seyyed Ali Seyyedsalehi,et al.  Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks , 2010, Comput. Methods Programs Biomed..

[13]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[14]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[15]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[16]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[17]  Yang Bingru,et al.  KAAPRO: An approach of protein secondary structure prediction based on KDD* in the compound pyramid prediction model , 2009 .

[18]  Bingru Yang,et al.  Association classification algorithm based on structure sequence in protein secondary structure prediction , 2010, Expert Syst. Appl..

[19]  Haitao Cheng,et al.  Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM) , 2007, Bioinform..

[20]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[21]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[22]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[23]  Esperanza García Gonzalo,et al.  The PSO family: deduction, stochastic analysis and comparison , 2009, Swarm Intelligence.

[24]  Bingru Yang,et al.  Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model , 2011, Knowl. Based Syst..

[25]  Juan Luis Fernández-Martínez,et al.  Theoretical analysis of particle swarm trajectories through a mechanical analogy , 2008 .

[26]  Jean-Marc Nicaud,et al.  Predicted secondary structure of hydroperoxide lyase from green bell pepper cloned in the yeast Yarrowia lipolytica , 2010 .

[27]  Taner Z Sen,et al.  Prediction of protein secondary structure by mining structural fragment database. , 2005, Polymer.

[28]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[29]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[30]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[31]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[32]  Martin Madera,et al.  Improving protein secondary structure prediction using a simple k-mer model , 2010, Bioinform..

[33]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[34]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[35]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[36]  Caro Lucas,et al.  A new expertness index for assessment of secondary structure prediction engines , 2007, Comput. Biol. Chem..

[37]  J. Fernández-Martínez,et al.  Stochastic Stability Analysis of the Linear Continuous and Discrete PSO Models , 2011, IEEE Transactions on Evolutionary Computation.

[38]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[39]  S Salzberg,et al.  Predicting protein secondary structure with a nearest-neighbor algorithm. , 1992, Journal of molecular biology.

[40]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[41]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Luigi Palopoli,et al.  Improving protein secondary structure predictions by prediction fusion , 2009, Inf. Fusion.

[43]  A. Kolinski Protein modeling and structure prediction with a reduced representation. , 2004, Acta biochimica Polonica.

[44]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[45]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[46]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[47]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[48]  Taner Z Sen,et al.  combining GOR V and Fragment Database Mining A Consensus Data Mining secondary structure prediction by , 2006 .

[49]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[50]  Alessandro Vullo,et al.  Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information , 2007, BMC Bioinformatics.

[51]  Yi Zhao,et al.  A protein secondary structure prediction framework based on the Extreme Learning Machine , 2008, Neurocomputing.