iFC2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content

Several descriptors of protein structure at the sequence and residue levels have been recently proposed. They are widely adopted in the analysis and prediction of structural and functional characteristics of proteins. Numerous in silico methods have been developed for sequence-based prediction of these descriptors. However, many of them do not have a public web-server and only a few integrate multiple descriptors to improve the predictions. We introduce iFC2 (integrated prediction of fold, class, and content) server that is the first to integrate three modern predictors of sequence-level descriptors. They concern fold type (PFRES), structural class (SCEC), and secondary structure content (PSSC-core). The server exploits relations between the three descriptors to implement a cross-evaluation procedure that improves over the predictions of the individual methods. The iFC2 annotates fold and class predictions as potentially correct/incorrect. When tested on datasets with low-similarity chains, for the fold prediction iFC2 labels 82% of the PFRES predictions as correct and the accuracy of these predictions equals 72%. The accuracy of the remaining 28% of the PFRES predictions equals 38%. Similarly, our server assigns correct labels for over 79% of SCEC predictions, which are shown to be 98% accurate, while the remaining SCEC predictions are only 15% accurate. These results are shown to be competitive when contrasted against recent relevant web-servers. Predictions on CASP8 targets show that the content predicted by iFC2 is competitive when compared with the content computed from the tertiary structures predicted by three best-performing methods in CASP8. The iFC2 server is available at http://biomine.ece.ualberta.ca/1D/1D.html.

[1]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[2]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[3]  Lukasz Kurgan,et al.  Sequence based prediction of relative solvent accessibility using two-stage support vector regression with confidence values , 2008 .

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  K C Chou,et al.  Prediction of tight turns and their types in proteins. , 2000, Analytical biochemistry.

[6]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[7]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[8]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[9]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[10]  M. Michael Gromiha,et al.  A Statistical Method for Predicting Protein Unfolding Rates from Amino Acid Sequence. , 2006 .

[11]  Piotr Berman,et al.  Fold classification based on secondary structure – how much is gained by including loop topology? , 2005, BMC Structural Biology.

[12]  Lin Zhi-Hua,et al.  Estimation of affinity of HLA-A*0201 restricted CTL epitope based on the SCORE function. , 2009, Protein and peptide letters.

[13]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[14]  Ralf Zimmer,et al.  AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings , 2007, Bioinform..

[15]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[16]  David S. Wishart,et al.  Protein contact order prediction from primary sequences , 2008, BMC Bioinformatics.

[17]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[18]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[19]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[20]  S. Vilar,et al.  A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. , 2009, Journal of theoretical biology.

[21]  Zheng Yuan,et al.  Quantifying the relationship of protein burying depth and sequence , 2007, Proteins.

[22]  Gianluca Pollastri,et al.  Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information , 2009, Proteins.

[23]  Kuo-Chen Chou,et al.  Prediction of protein secondary structure content by artificial neural network , 2003, J. Comput. Chem..

[24]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[25]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[26]  Z.-C. Li,et al.  Prediction of protein structure class by coupling improved genetic algorithm and support vector machine , 2008, Amino Acids.

[27]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[28]  Bin Xue,et al.  Real‐value prediction of backbone torsion angles , 2008, Proteins.

[29]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[30]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[31]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[32]  George Karypis,et al.  Building multiclass classifiers for remote homology detection and fold recognition , 2006, BMC Bioinformatics.

[33]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[34]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[35]  K. Chou,et al.  Using Pair-Coupled Amino Acid Composition to Predict Protein Secondary Structure Content , 1999, Journal of protein chemistry.

[36]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[37]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[38]  C. Kuo-chen,et al.  FoldRate: A Web-Server for Predicting Protein Folding Rates from Primary Sequence , 2009 .

[39]  M Michael Gromiha,et al.  Motifs in outer membrane protein sequences: applications for discrimination. , 2005, Biophysical chemistry.

[40]  Shandar Ahmad,et al.  NETASA: neural network based prediction of solvent accessibility , 2002, Bioinform..

[41]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[42]  B. Rost,et al.  Protein flexibility and rigidity predicted from sequence , 2005, Proteins.

[43]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[44]  Lukasz A. Kurgan,et al.  Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences , 2009, BMC Bioinformatics.

[45]  Xiaoyong Zou,et al.  Predicting protein structural class based on multi-features fusion. , 2008, Journal of theoretical biology.

[46]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[47]  K. Chou Prediction and classification of α‐turn types , 1997 .

[48]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[49]  Torgeir R. Hvidsten,et al.  Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts , 2009, Bioinform..

[50]  Ke Chen,et al.  Prediction of protein secondary structure content for the twilight zone sequences , 2007, Proteins.

[51]  Prasanna R Kolatkar,et al.  Assessment of CASP7 structure predictions for template free targets , 2007, Proteins.

[52]  Jason Weston,et al.  SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition , 2007, BMC Bioinformatics.

[53]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[54]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[55]  S. Sarbadhikari,et al.  Moderate exercise and chronic stress produce counteractive effects on different areas of the brain by acting through various neurotransmitter receptor subtypes: A hypothesis , 2006, Theoretical Biology and Medical Modelling.

[56]  William J. Welsh,et al.  Improved method for predicting ?-turn using support vector machine , 2005, Bioinform..

[57]  Lukasz A. Kurgan,et al.  Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences , 2005, Artif. Intell. Medicine.

[58]  A. Finkelstein,et al.  Prediction of protein folding rates from the amino acid sequence-predicted secondary structure , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[59]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[60]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[61]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[62]  Yuehui Chen,et al.  Protein fold recognition based on error correcting output codes and SVM. , 2008, Protein and peptide letters.

[63]  Piero Fariselli,et al.  Improved prediction of the number of residue contacts in proteins by recurrent neural networks , 2001, ISMB.

[64]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[65]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[66]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[67]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[68]  Lukasz A. Kurgan,et al.  Prediction of protein folding rates from primary sequences using hybrid sequence representation , 2009, J. Comput. Chem..

[69]  Lukasz A. Kurgan,et al.  Sequence based residue depth prediction using evolutionary information and predicted secondary structure , 2008, BMC Bioinformatics.

[70]  K. Chou,et al.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. , 2008, Journal of theoretical biology.

[71]  E. Padlan,et al.  Why don't humans get scrapie from eating sheep? A possible explanation based on secondary structure predictions. , 2005, Medical hypotheses.

[72]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[73]  Lukasz Kurgan,et al.  Sequence-Based Protein Crystallization Propensity Prediction for Structural Genomics: Review and Comparative Analysis , 2009 .

[74]  Xiuzhen Hu,et al.  Using support vector machine to predict β‐ and γ‐turns in proteins , 2008, J. Comput. Chem..

[75]  Lukasz Kurgan,et al.  On the Relation Between the Predicted Secondary Structure and the Protein Size , 2008, The protein journal.

[76]  Lukasz Kurgan,et al.  On the relation between residue flexibility and local solvent accessibility in proteins , 2009, Proteins.

[77]  K. Chou,et al.  Using maximum entropy model to predict protein secondary structure with single sequence. , 2009, Protein and peptide letters.

[78]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[79]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[80]  Lukasz Kurgan,et al.  Accurate prediction of protein folding rates from sequence and sequence‐derived residue flexibility and solvent accessibility , 2010, Proteins.

[81]  Haipeng Gong,et al.  Local secondary structure content predicts folding rates for simple, two-state proteins. , 2003, Journal of molecular biology.

[82]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[83]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[84]  Lukasz Kurgan,et al.  Meta prediction of protein crystallization propensity. , 2009, Biochemical and biophysical research communications.

[85]  Zhenbing Zeng,et al.  Multiple classifier integration for the prediction of protein structural classes , 2009, J. Comput. Chem..

[86]  Akira R. Kinjo,et al.  Recoverable one-dimensional encoding of three-dimensional protein structures , 2005, Bioinform..

[87]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[88]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[89]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[90]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[91]  K. Nishikawa,et al.  Ja n 20 05 Recoverable One-dimensional Encoding of Protein Three-dimensional Structures , 2005 .

[92]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[93]  K. Nishikawa,et al.  Predicting absolute contact numbers of native protein structure from amino acid sequence , 2004, Proteins.

[94]  M. Michael Gromiha and S. Selvaraj,et al.  Bioinformatics Approaches for Understanding and Predicting Protein Folding Rates , 2008 .

[95]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[96]  M. Michael Gromiha,et al.  A Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information , 2005, J. Chem. Inf. Model..

[97]  Zu-Guo Yu,et al.  Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. , 2009 .

[98]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[99]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[100]  Loris Nanni,et al.  A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. , 2009, Protein and peptide letters.

[101]  P. Radivojac,et al.  Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.

[102]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[103]  Aarti Garg,et al.  DPROT: prediction of disordered proteins using evolutionary information , 2008, Amino Acids.

[104]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[105]  Akira R. Kinjo,et al.  Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks , 2005, Biophysics.

[106]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[107]  Kuo-Chen Chou,et al.  Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes , 2008, J. Comput. Chem..

[108]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[109]  Lukasz Kurgan,et al.  Prediction of protein structural class for the twilight zone sequences. , 2007, Biochemical and biophysical research communications.

[110]  Jiangning Song,et al.  Predicting residue-wise contact orders in proteins by support vector regression , 2006, BMC Bioinformatics.

[111]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[112]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[113]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[114]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[115]  K C Chou Prediction and classification of alpha-turn types. , 1997, Biopolymers.

[116]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[117]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[118]  M. Michael Gromiha,et al.  A simple statistical method for discriminating outer membrane proteins with better accuracy , 2005, Bioinform..

[119]  Dongsup Kim,et al.  Prediction of protein secondary structure content using amino acid composition and evolutionary information , 2005, Proteins.

[120]  Anna H. Klemm,et al.  CapZ-lipid membrane interactions: a computer analysis , 2006, Theoretical Biology and Medical Modelling.

[121]  David S. Wishart,et al.  PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation , 2008, Nucleic Acids Res..

[122]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[123]  Jin‐Pei Cheng,et al.  Prediction of folding transition‐state position (βT) of small, two‐state proteins from local secondary structure content , 2007, Proteins.

[124]  Jiangning Song,et al.  Prediction of protein folding rates from primary sequence by fusing multiple sequential features , 2009 .

[125]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[126]  Lukasz A. Kurgan,et al.  Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments , 2008, BMC Bioinformatics.

[127]  Jan Komorowski,et al.  A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins , 2004, Bioinform..