Prediction of B-cell epitopes using evolutionary information and propensity scales

BackgroundDevelopment of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Because of the highly variable yet enigmatic nature of B-cell epitopes, their prediction presents a great challenge to computational immunologists.MethodsWe propose a method, BEEPro (B-cell e pitope prediction by e volutionary information and pro pensity scales), which adapts a linear averaging scheme on 16 properties using a support vector machine model to predict both linear and conformational B-cell epitopes. These 16 properties include position specific scoring matrix (PSSM), an amino acid ratio scale, and a set of 14 physicochemical scales obtained via a feature selection process. Finally, a three-way data split procedure is used during the validation process to prevent over-estimation of prediction performance and avoid bias in our experiment results.ResultsIn our experiment, first we use a non-redundant linear B-cell epitope dataset curated by Sollner et al. for feature selection and parameter optimization. Evaluated by a three-way data split procedure, BEEPro achieves significant improvement with the area under the receiver operating curve (AUC) = 0.9987, accuracy = 99.29%, mathew's correlation coefficient (MCC) = 0.9281, sensitivity = 0.9604, specificity = 0.9946, positive predictive value (PPV) = 0.9042 for the Sollner dataset. In addition, the same parameters are used to evaluate performance on other independent linear B-cell epitope test datasets, BEEPro attains an AUC which ranges from 0.9874 to 0.9950 and an accuracy which ranges from 93.73% to 97.31%. Moreover, five-fold cross-validation on one benchmark conformational B-cell epitope dataset yields an accuracy of 92.14% and AUC of 0.9066.ConclusionsCompared with other current models, our method achieves a significant improvement with respect to AUC, accuracy, MCC, sensitivity, specificity, and PPV. Thus, we have shown that an appropriate combination of evolutionary information and propensity scales with a support vector machine model can significantly enhance the prediction performance of both linear and conformational B-cell epitopes.

[1]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[2]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[3]  Avner Schlessinger,et al.  Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools , 2007, Journal of molecular recognition : JMR.

[4]  Costas S. Iliopoulos,et al.  An algorithm for mapping short reads to a dynamically changing genomic sequence , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  D. Flower,et al.  Benchmarking B cell epitope prediction: Underperformance of existing methods , 2005, Protein science : a publication of the Protein Society.

[6]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[7]  Vasant Honavar,et al.  Recent advances in B-cell epitope prediction methods , 2010, Immunome research.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[10]  P. Tongaonkar,et al.  A semi‐empirical method for prediction of antigenic determinants on protein antigens , 1990, FEBS letters.

[11]  Tun-Wen Pai,et al.  Prediction of B-cell Linear Epitopes with a Combination of Support Vector Machine Classification and Amino Acid Propensity Identification , 2011, Journal of biomedicine & biotechnology.

[12]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[13]  K. Nagano Logical analysis of the mechanism of protein folding. I. Predictions of helices, loops and beta-structures from primary structure. , 1973, Journal of molecular biology.

[14]  P. Ponnuswamy,et al.  Positional flexibilities of amino acid residues in globular proteins , 2009 .

[15]  E Westhof,et al.  Correlation between the location of antigenic sites and the prediction of turns in proteins. , 1993, Immunology letters.

[16]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[17]  Morten Nielsen,et al.  Improved method for predicting linear B-cell epitopes , 2006, Immunome research.

[18]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[19]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[20]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[21]  Sudipto Saha,et al.  Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network , 2006, Proteins.

[22]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Nimrod D. Rubinstein,et al.  A machine-learning approach for predicting B-cell epitopes. , 2009, Molecular immunology.

[24]  Howard Leung,et al.  Prediction of membrane protein types from sequences and position-specific scoring matrices. , 2007, Journal of theoretical biology.

[25]  Channa K. Hattotuwagama,et al.  AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data , 2005, Immunome research.

[26]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[27]  R. Hodges,et al.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. , 1986, Biochemistry.

[28]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[29]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Bhaskar D. Kulkarni,et al.  Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM , 2007, Pattern Recognit. Lett..

[32]  Urmila Kulkarni-Kale,et al.  CEP: a conformational epitope prediction server , 2005, Nucleic Acids Res..

[33]  Vasant Honavar,et al.  Predicting Protective Linear B-Cell Epitopes Using Evolutionary Information , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[34]  Arno Lukas,et al.  Analysis and prediction of protective continuous B-cell epitopes on pathogen proteins , 2008, Immunome research.

[35]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.

[36]  Gajendra PS Raghava,et al.  Identification of conformational B-cell Epitopes in an antigen from its primary sequence , 2010, Immunome research.

[37]  M. Charton,et al.  The dependence of the Chou-Fasman parameters on amino acid side chain structure. , 1983, Journal of theoretical biology.

[38]  Vasant Honavar,et al.  Predicting flexible length linear B-cell epitopes. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[39]  Vasant G Honavar,et al.  Predicting linear B‐cell epitopes using string kernels , 2008, Journal of molecular recognition : JMR.

[40]  U. Bastolla,et al.  Principal eigenvector of contact matrices and hydrophobicity profiles in proteins , 2004, Proteins.