Prediction of beta-hairpins in proteins using physicochemical properties and structure information.

In this study, we propose a new method to predict hairpins in proteins and its evaluation based on the support vector machine. Different from previous methods, new feature representation scheme based on auto covariance is adopted. We also investigate two structure properties of proteins (protein secondary structure and residue conformation propensity), and examine their effects on prediction. Moreover, we employ an ensemble classifier approach based on the majority voting to improve prediction accuracy on hairpins. Experimental results on a dataset of 1926 protein chains show that our approach outperforms those previously published in the literature, which demonstrates the effectiveness of the proposed method.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  G. Rose,et al.  Turns in peptides and proteins. , 1985, Advances in protein chemistry.

[3]  B. L. Sibanda,et al.  β-Hairpin families in globular proteins , 1985, Nature.

[4]  E. Milner-White,et al.  Four classes of beta-hairpins in proteins. , 1986, The Biochemical journal.

[5]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[6]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[7]  D Xu,et al.  Prediction of protein supersecondary structures based on the artificial neural network method. , 1997, Protein engineering.

[8]  D.-S. Huang,et al.  Radial Basis Probabilistic Neural Networks: Model and Application , 1999, Int. J. Pattern Recognit. Artif. Intell..

[9]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[10]  Janet M. Thornton,et al.  Toward predicting protein topology: An approach to identifying β hairpins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..

[12]  Jens Meiler,et al.  Strand‐loop‐strand motifs: Prediction of hairpins and diverging turns in proteins , 2004, Proteins.

[13]  De-Shuang Huang,et al.  Inter-residue spatial distance map prediction by using integrating GA with RBFNN. , 2004, Protein and peptide letters.

[14]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[15]  De-Shuang Huang,et al.  Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme , 2004, J. Comput. Aided Mol. Des..

[16]  Szymon M. Kielbasa,et al.  Measuring similarities between transcription factor binding sites , 2005, BMC Bioinformatics.

[17]  De-Shuang Huang,et al.  Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction , 2005, Pattern Recognit. Lett..

[18]  Gajendra P. S. Raghava,et al.  BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques , 2005, Nucleic Acids Res..

[19]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[20]  De-Shuang Huang,et al.  Prediction of protein secondary structure using improved two-level neural network architecture. , 2005, Protein and peptide letters.

[21]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..

[22]  De-Shuang Huang,et al.  Improved performance in protein secondary structure prediction by combining multiple predictions. , 2006, Protein and peptide letters.

[23]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[24]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[25]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[26]  Xueling Li,et al.  Efficient ensemble schemes for protein secondary structure prediction. , 2008, Protein and peptide letters.

[27]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[28]  Dongsheng Zou,et al.  β‐Hairpin prediction with quadratic discriminant analysis using diversity measure , 2009, J. Comput. Chem..

[29]  Qian-Zhong Li,et al.  Recognition of β-hairpin motifs in proteins by using the composite vector , 2009, Amino Acids.

[30]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[31]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[32]  Sven Griep,et al.  PDBselect 1992–2009 and PDBfilter-select , 2009, Nucleic Acids Res..

[33]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.