Prediction of protein structural class for the twilight zone sequences.

Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation.

[1]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[2]  Parviz Abdolmaleki,et al.  Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. , 2007, Journal of theoretical biology.

[3]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[4]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[5]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[6]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[7]  Kuo-Chen Chou,et al.  Boosting classifier for predicting protein domain structural class. , 2005, Biochemical and biophysical research communications.

[8]  U. Hobohm,et al.  A sequence property approach to searching protein databases. , 1995, Journal of molecular biology.

[9]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[10]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[11]  C. Zhang,et al.  A new approach to predict the helix/strand content of globular proteins. , 2001, Journal of theoretical biology.

[12]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[13]  X M Pan,et al.  Accurate Prediction of Protein Secondary Structural Content , 2001, Journal of protein chemistry.

[14]  K. Chou,et al.  Using LogitBoost classifier to predict protein structural classes. , 2006, Journal of theoretical biology.

[15]  Ming Yan,et al.  Prediction of the helix/strand content of globular proteins based on their primary sequences. , 1998, Protein engineering.

[16]  Lukasz A. Kurgan,et al.  Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences , 2005, Artif. Intell. Medicine.

[17]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[19]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[20]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[21]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[22]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[23]  Lukasz A. Kurgan,et al.  A comment on "Prediction of protein structural classes by a new measure of information discrepancy" , 2006, Comput. Biol. Chem..

[24]  Lukasz A. Kurgan,et al.  Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[25]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[26]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods , 1996, Proteins.

[27]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[28]  C. Zhang,et al.  Prediction of protein (domain) structural classes based on amino-acid index. , 1999, European journal of biochemistry.

[29]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[30]  Bin Wang,et al.  Weave amino acid sequences for protein secondary structure prediction , 2003, DMKD '03.

[31]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[32]  Huanwen Tang,et al.  Prediction of protein structural classes by a new measure of information discrepancy , 2003, Comput. Biol. Chem..

[33]  Y Cai,et al.  Prediction of protein structural classes by neural network. , 2000, Biochimie.

[34]  Yuan Yuan,et al.  Using Bagging classifier to predict protein domain structural class. , 2006, Journal of biomolecular structure & dynamics.

[35]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[36]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[38]  Yu-Dong Cai,et al.  Support vector machines for prediction of protein domain structural class. , 2003, Journal of theoretical biology.

[39]  S H Kim,et al.  Predicting protein secondary structure content. A tandem neural network approach. , 1992, Journal of molecular biology.

[40]  Hiroshi Mamitsuka,et al.  Finding the biologically optimal alignment of multiple sequences , 2005, Artif. Intell. Medicine.

[41]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[42]  N. Balakrishnan,et al.  Characterization of protein secondary structure , 2004, IEEE Signal Processing Magazine.

[43]  Lukasz A. Kurgan,et al.  Optimization of the Sliding Window Size for Protein Structure Prediction , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[44]  Z Zhang,et al.  Prediction of the Secondary Structure Contents of Globular Proteins Based on Three Structural Classes , 1998, Journal of protein chemistry.