Prediction of secondary structures of proteins using a two-stage method

Abstract Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. However, prediction accuracies of these methods rarely exceed 70%. In this paper, a novel two-stage method to predict the location of secondary structure elements in a protein using the primary structure data only is presented. In the first stage of the proposed method, the folding type of a protein is determined using a novel classification approach for multi-class problems. The second stage of the method utilizes data available in the Protein Data Bank and determines the possible location of secondary structure elements in a probabilistic search algorithm. It is shown that the average accuracy of the predictions is 74.1% on a large structure dataset.

[1]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[2]  K. Chou,et al.  Does the folding type of a protein depend on its amino acid composition? , 1995, FEBS letters.

[3]  Metin Turkay,et al.  A mixed-integer programming approach to multi-class data classification problem , 2006, Eur. J. Oper. Res..

[4]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[5]  John L. Klepeis,et al.  Ab initio prediction of helical segments in polypeptides , 2002, J. Comput. Chem..

[6]  R Thiele,et al.  Protein threading by recursive dynamic programming. , 1999, Journal of molecular biology.

[7]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[8]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[9]  Kuo-Chen Chou,et al.  Artificial Neural Network Method for Predicting Protein Secondary Structure Content , 2002, Comput. Chem..

[10]  Benny Lautrup,et al.  A novel approach to prediction of the 3‐dimensional structures of protein backbones by neural networks , 1990, NIPS.

[11]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[12]  H. Scheraga,et al.  Experimental and theoretical aspects of protein folding. , 1975, Advances in protein chemistry.

[13]  David Kendrick,et al.  GAMS, a user's guide , 1988, SGNM.

[14]  Srikanta Sen,et al.  Statistical analysis of pair-wise compatibility of spatially nearest neighbor and adjacent residues in alpha-helix and beta-strands: application to a minimal model for secondary structure prediction. , 2003, Biophysical chemistry.

[15]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[20]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[21]  Y Cai,et al.  Prediction of protein structural classes by neural network. , 2000, Biochimie.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Burkhard Rost,et al.  Rising Accuracy of Protein Secondary Structure Prediction , 2003 .

[24]  J M Thornton,et al.  Protein structure prediction. , 1998, Current opinion in biotechnology.

[25]  R. Schulz,et al.  Protein Structure Prediction , 2020, Methods in Molecular Biology.

[26]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[27]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Metin Turkay,et al.  Prediction of secondary structures of proteins using a two-stage method , 2006 .

[29]  Metin Turkay,et al.  Prediction of folding type of proteins using mixed-integer linear programming , 2005 .

[30]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[31]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[32]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[33]  A. Liwo,et al.  Protein structure prediction by global optimization of a potential energy function. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Andrzej Kloczkowski,et al.  Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information , 2002 .

[35]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[36]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[37]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[38]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[39]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[40]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[41]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[42]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[43]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[44]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.