The prediction of protein structural class using averaged chemical shifts

Knowledge of protein structural class can provide important information about its folding patterns. Many approaches have been developed for the prediction of protein structural classes. However, the information used by these approaches is primarily based on amino acid sequences. In this study, a novel method is presented to predict protein structural classes by use of chemical shift (CS) information derived from nuclear magnetic resonance spectra. Firstly, 399 non-homologue (about 15% identity) proteins were constructed to investigate the distribution of averaged CS values of six nuclei (13CO, 13Cα, 13Cβ, 1HN, 1Hα and 15N) in three protein structural classes. Subsequently, support vector machine was proposed to predict three protein structural classes by using averaged CS information of six nuclei. Overall accuracy of jackknife cross-validation achieves 87.0%. Finally, the feature selection technique is applied to exclude redundant information and find out an optimized feature set. Results show that the overall accuracy increased to 88.0% by using the averaged CSs of 13CO, 1Hα and 15N. The proposed approach outperformed other state-of-the-art methods in terms of predictive accuracy in particular for low-similarity protein data. We expect that our proposed approach will be an excellent alternative to traditional methods for protein structural class prediction.

[1]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[2]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[3]  Parviz Abdolmaleki,et al.  Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. , 2007, Journal of theoretical biology.

[4]  F E Cohen,et al.  Prediction of the three‐dimensional structure of human growth hormone , 1987, Proteins.

[5]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[6]  David S Wishart,et al.  A simple method to predict protein flexibility using secondary chemical shifts. , 2005, Journal of the American Chemical Society.

[7]  Fabio C. L. Almeida,et al.  Prediction of the amount of secondary structure of proteins using unassigned NMR spectra: a tool for target selection in structural proteomics , 2006 .

[8]  Xinguo Lu,et al.  A novel graphical representation of protein sequences and its application , 2011, J. Comput. Chem..

[9]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[10]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[11]  Jun Wang,et al.  An information‐theoretic approach to the prediction of protein structural class , 2010, J. Comput. Chem..

[12]  Shengli Zhang,et al.  High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. , 2011, Biochimie.

[13]  K. Chou,et al.  A correlation-coefficient method to predicting protein-structural classes from amino acid compositions. , 1992, European journal of biochemistry.

[14]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class , 1996, Proteins.

[15]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[16]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[17]  Jun Ni,et al.  Protein structural class prediction based on an improved statistical strategy , 2008, BMC Bioinformatics.

[18]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[19]  D. Connelly,et al.  Cross‐validation of protein structural class prediction using statistical clustering and neural networks , 1993, Protein science : a publication of the Protein Society.

[20]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[21]  Xiaoyong Zou,et al.  Predicting protein structural class based on multi-features fusion. , 2008, Journal of theoretical biology.

[22]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[23]  K C Chou,et al.  An analysis of protein folding type prediction by seed-propagated sampling and jackknife test , 1995, Journal of protein chemistry.

[24]  Xueye Wang,et al.  Molecular design of a “molecular syringe” mimic for metal cations using a 1,3‐alternate calix[4]arene cavity , 2010, J. Comput. Chem..

[25]  Lukasz Kurgan,et al.  iFC2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content , 2010, Amino Acids.

[26]  David S. Wishart,et al.  CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data , 2008, Nucleic Acids Res..

[27]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[28]  Tsuyoshi Kato,et al.  An Accurate Prediction Method for Protein Structural Class from Signal Patterns of NMR Spectra in the Absence of Chemical Shift Assignments , 2010, 2010 IEEE International Conference on BioInformatics and BioEngineering.

[29]  Lukasz Kurgan,et al.  Prediction of protein structural class for the twilight zone sequences. , 2007, Biochemical and biophysical research communications.

[30]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[31]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[32]  Angelo M Facchiano,et al.  Prediction of the protein structural class by specific peptide frequencies. , 2009, Biochimie.

[33]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[34]  Michele Vendruscolo,et al.  Protein structure determination from NMR chemical shifts , 2007, Proceedings of the National Academy of Sciences.

[35]  Kuldip Singh,et al.  A Time-Series-Based Feature Extraction Approach for Prediction of Protein Structural Class , 2008, EURASIP J. Bioinform. Syst. Biol..

[36]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[37]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[38]  Kuo-Chen Chou,et al.  Boosting classifier for predicting protein domain structural class. , 2005, Biochemical and biophysical research communications.

[39]  Kuo-Chen Chou,et al.  Predicting protein structural class with AdaBoost Learner. , 2006, Protein and peptide letters.

[40]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[41]  Xiaoqi Zheng,et al.  Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. , 2010, Biochimie.

[42]  S. P. Mielke,et al.  Characterization of protein secondary structure from NMR chemical shifts. , 2009, Progress in nuclear magnetic resonance spectroscopy.

[43]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[44]  Cangzhi Jia,et al.  A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. , 2010, Journal of theoretical biology.

[45]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[46]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[47]  S. P. Mielke,et al.  An evaluation of chemical shift index-based secondary structure determination in proteins: Influence of random coil chemical shifts , 2004, Journal of biomolecular NMR.

[48]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[49]  Zhenbing Zeng,et al.  Multiple classifier integration for the prediction of protein structural classes , 2009, J. Comput. Chem..

[50]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[51]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[52]  David S Wishart,et al.  RefDB: A database of uniformly referenced protein chemical shifts , 2003, Journal of biomolecular NMR.

[53]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[54]  Da-Peng Li,et al.  Amino Acid Principal Component Analysis (AAPCA) and its Applications in Protein Structural Class Prediction , 2006, Journal of biomolecular structure & dynamics.

[55]  David S. Wishart,et al.  The RCI server: rapid and accurate calculation of protein flexibility using chemical shifts , 2007, Nucleic Acids Res..

[56]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[57]  David S Wishart,et al.  Interpreting protein chemical shift data. , 2011, Progress in nuclear magnetic resonance spectroscopy.

[58]  H. Kalbitzer,et al.  Mapping of protein structural ensembles by chemical shifts , 2010, Journal of biomolecular NMR.

[59]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[60]  László Szilágyi,et al.  Chemical shifts in proteins come of age , 1995 .

[61]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[62]  Zong Dai,et al.  Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis , 2009, Amino Acids.

[63]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[64]  Ganesan Pugalenthi,et al.  Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. , 2008, Journal of theoretical biology.

[65]  Xiaoqi Zheng,et al.  Prediction of protein structural class using a complexity-based distance measure , 2010, Amino Acids.

[66]  H. Cid,et al.  Hydrophobicity and structural classes in proteins. , 1992, Protein engineering.

[67]  Xin Chen,et al.  Prediction of protein structural classes for low-homology sequences based on predicted secondary structure , 2010, BMC Bioinformatics.

[68]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[69]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[70]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[71]  V. V. Krishnan,et al.  Protein structural class identification directly from NMR spectra using averaged chemical shifts , 2003, Bioinform..

[72]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[73]  C. Zhang,et al.  A weighting method for predicting protein structural class from amino acid composition. , 1992, European journal of biochemistry.

[74]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[75]  Evaluating long-term relationship of protein sequence by use of D-interval conditional probability and its impact on protein structural class prediction. , 2009, Protein and peptide letters.

[76]  Kuo-Chen Chou,et al.  Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes , 2008, J. Comput. Chem..

[77]  Sheng-You Huang,et al.  Structural class tendency of polypeptide: A new conception in predicting protein structural class , 2007 .

[78]  Zhou Genfa,et al.  A weighting method for predicting protein structural class from amino acid composition. , 1992 .

[79]  Lukasz A. Kurgan,et al.  Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences , 2009, BMC Bioinformatics.

[80]  Parviz Abdolmaleki,et al.  Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. , 2007, Biophysical chemistry.

[81]  R. C. Sprinthall Basic Statistical Analysis , 1982 .