Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines.

In principle, structural information of protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements. Although some ab initio methods for protein structure prediction have been reported, the long-range interactions required to accurately predict tertiary structures of beta-sheet containing proteins are still difficult to simulate. To remedy this problem and facilitate de novo prediction of beta-sheet containing protein structures, we developed a support vector machine (SVM) approach that classified parallel and antiparallel orientation of beta-strands by using the information of interstrand amino acid pairing preferences. Based on a second-order statistics on the relative frequencies of each possible interstrand amino acid pair, we defined an average amino acid pairing encoding matrix (APEM) for encoding beta-strands as input in the prediction model. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.71 have been achieved through 7-fold cross-validation on a non-redundant protein dataset from PISCES. Although several issues still remain to be studied, the method presented here to some extent could indicate the important contribution of the amino acid pairs to the beta-strand orientation, and provide a possible way to further be combined with other algorithms making a full 'identification' of beta-strands.

[1]  Lukasz A. Kurgan,et al.  Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs , 2009, J. Comput. Chem..

[2]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[3]  Kuo-Chen Chou,et al.  Energetics of interactions of regular structural elements in proteins , 1990 .

[4]  Jiangning Song,et al.  Prediction of protein folding rates from primary sequence by fusing multiple sequential features , 2009 .

[5]  D. Osguthorpe Ab initio protein folding. , 2000, Current opinion in structural biology.

[6]  Parviz Abdolmaleki,et al.  gamma-Turn types prediction in proteins using the support vector machines. , 2007, Journal of theoretical biology.

[7]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[8]  Parviz Abdolmaleki,et al.  Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. , 2007, Biophysical chemistry.

[9]  C. Sander,et al.  Specific recognition in the tertiary structure of β-sheets of proteins , 1980 .

[10]  H. Scheraga,et al.  Effect of amino acid composition on the twist and the relative stability of parallel and antiparallel .beta.-sheets , 1983 .

[11]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[12]  De-Shuang Huang,et al.  Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction , 2005, Pattern Recognit. Lett..

[13]  J. Thornton,et al.  Prediction of strand pairing in antiparallel and parallel β‐sheets using information theory , 2002, Proteins.

[14]  Kuo-Chen Chou,et al.  Energetic approach to the folding of α/β barrels , 1991 .

[15]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[16]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[17]  Jens Meiler,et al.  Strand‐loop‐strand motifs: Prediction of hairpins and diverging turns in proteins , 2004, Proteins.

[18]  Amelia A. Fuller,et al.  A cross‐strand Trp–Trp pair stabilizes the hPin1 WW domain at the expense of function , 2007, Protein science : a publication of the Protein Society.

[19]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[20]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[21]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[24]  Pierre Baldi,et al.  Matching Protein b-Sheet Partners by Feedforward and Recurrent Neural Networks , 2000, ISMB.

[25]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[26]  X.-D. Sun,et al.  Prediction of protein structural classes using support vector machines , 2006, Amino Acids.

[27]  J. Skolnick,et al.  Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[28]  Kuo-Chen Chou,et al.  Identify catalytic triads of serine hydrolases by support vector machines. , 2004, Journal of theoretical biology.

[29]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[30]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[31]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[32]  Kuo-Chen Chou,et al.  Support vector machines for the classification and prediction of β‐turn types , 2002, Journal of peptide science : an official publication of the European Peptide Society.

[33]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[34]  Pierre Baldi,et al.  ICBS: a database of interactions between protein chains mediated by ?-sheet formation , 2004, Bioinform..

[35]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[36]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[37]  H A Scheraga,et al.  Origin of the right-handed twist of beta-sheets of poly(LVal) chains. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Kuo-Chen Chou,et al.  Interactions between two -sheets energetics of / packing in proteins , 1986 .

[39]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Tao Zhang,et al.  SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures , 2007, Data Sci. J..

[42]  Ram Samudrala,et al.  Ab initio protein structure prediction using a combined hierarchical approach , 1999, Proteins.

[43]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[44]  Jishou Ruan,et al.  The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands. , 2009, Biochemical and biophysical research communications.

[45]  Julian Lee,et al.  Protein structure prediction based on fragment assembly and parameter optimization. , 2005, Biophysical chemistry.

[46]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[47]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[48]  H. Scheraga,et al.  Role of interchain interactions in the stabilization of the right-handed twist of beta-sheets. , 1983, Journal of molecular biology.

[49]  William J. Welsh,et al.  Improved method for predicting ?-turn using support vector machine , 2005, Bioinform..

[50]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[51]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[52]  Kuo-Chen Chou,et al.  Support vector machines for prediction of protein signal sequences and their cleavage sites , 2003, Peptides.

[53]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[54]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[55]  Kuo-Chen Chou,et al.  Interactions between an α-helix and a β-sheet: Energetics of αβ packing in proteins☆ , 1985 .

[56]  M. A. Wouters,et al.  An analysis of side chain interactions and pair correlations within antiparallel β‐sheets: The differences between backbone hydrogen‐bonded and non‐hydrogen‐bonded residue pairs , 1995, Proteins.

[57]  Fengmin Li,et al.  Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. , 2008, Protein and peptide letters.

[58]  Vasant Honavar,et al.  Glycosylation site prediction using ensembles of Support Vector Machine classifiers , 2007, BMC Bioinformatics.

[59]  P Rotkiewicz,et al.  Generalized comparative modeling (GENECOMP): A combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement , 2001, Proteins.

[60]  M. Searle,et al.  Design of beta-sheet systems for understanding the thermodynamics and kinetics of protein folding. , 2004, Current opinion in structural biology.

[61]  C. Kuo-chen,et al.  FoldRate: A Web-Server for Predicting Protein Folding Rates from Primary Sequence , 2009 .

[62]  K. Chou,et al.  Support vector machines for predicting the specificity of GalNAc-transferase , 2002, Peptides.

[63]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[64]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[65]  B. Fan,et al.  Classification study of skin sensitizers based on support vector machine and linear discriminant analysis. , 2006, Analytica chimica acta.

[66]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[67]  A. Cochran,et al.  Designing Stable β-Hairpins: Energetic Contributions from Cross-Strand Residues , 2000 .

[68]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[69]  M. Sternberg,et al.  The disulphide beta-cross: from cystine geometry and clustering to classification of small disulphide-rich protein folds. , 1996, Journal of molecular biology.

[70]  J M Sturtevant,et al.  Sidechain interactions in parallel beta sheets: the energetics of cross-strand pairings. , 1999, Structure.

[71]  Guangya Zhang,et al.  Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. , 2008, Journal of theoretical biology.

[72]  Xiaoying Jiang,et al.  Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. , 2008, Protein and peptide letters.

[73]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[74]  Robert S. McDowell,et al.  A Minimal Peptide Scaffold for β-Turn Display: Optimizing a Strand Position in Disulfide-Cyclized β-Hairpins , 2001 .

[75]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[76]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[77]  J. Nowick Exploring beta-sheet structure and interactions with chemical model systems. , 2008, Accounts of chemical research.

[78]  Guangya Zhang,et al.  Predicting lipase types by improved Chou's pseudo-amino acid composition. , 2008, Protein and peptide letters.

[79]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[80]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[81]  Acr Martin,et al.  Amino Acid Pairing Preferences in Parallel β-Sheets in Proteins , 2006 .

[82]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[84]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[85]  François Major,et al.  Ranking the factors that contribute to protein β‐sheet folding , 2007 .