Predicting linear B-cell epitopes using amino acid anchoring pair composition

BackgroundAccurate identification of linear B-cell epitopes plays an important role in peptide vaccine designs, immunodiagnosis, and antibody productions. Although several prediction methods have been reported, unsatisfied accuracy has limited the broad usages in linear B-cell epitope prediction. Therefore, developing a reliable model with significant improvement on prediction accuracy is highly desirable.ResultsIn this study, we developed a novel model for prediction of linear B-cell epitopes, APCpred, which was derived from the combination of amino acid anchoring pair composition (APC) and Support Vector Machine (SVM) methods. Systematic comparisons with the existing prediction models demonstrated that APCpred method significantly improved the prediction accuracy both in fivefold cross-validation of training datasets and in independent blind datasets. In the fivefold cross-validation test with Chen872 dataset at window size of 20, APCpred achieved AUC of 0.809 and accuracy of 72.94%, which was much more accurate than the existing models, e.g., Bayesb, Chen’s AAP methods and the enhanced combination method of AAP with five AP scales. For the fivefold cross-validation test with ABC16 dataset, APCpred achieved an improved AUC of 0.794 and ACC of 73.00% at window size of 16, and attained an AUC of 0.748 and ACC of 67.96% on Blind387 dataset after being trained with ABC16 dataset. Trained with Lbtope_Confirm dataset, APCpred achieved an increased Acc of 55.09% on FBC934 dataset. Within sequence window sizes from 12 to 20, APCpred final model on homology-reduced dataset achieved an optimal AUC of 0.748 and ACC of 68.43% in fivefold cross-validation at the window size of 20.ConclusionAPCpred model demonstrated a significant improvement in predicting linear B-cell epitopes using the features of amino acid anchoring pair composition (APC). Based on our study, a webserver has been developed for on-line prediction of linear B-cell epitopes, which is a free access at: http:/ccb.bmi.ac.cn/APCpred/.

[1]  Avner Schlessinger,et al.  Epitome: database of structure-inferred antigenic epitopes , 2005, Nucleic Acids Res..

[2]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[3]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[4]  M. Bhasin,et al.  Bcipep: A database of B-cell epitopes , 2005, BMC Genomics.

[5]  R. Lerner,et al.  The chemistry and mechanism of antibody binding to protein antigens. , 1988, Advances in immunology.

[6]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[7]  Sudipto Saha,et al.  Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network , 2006, Proteins.

[8]  J. Hazes,et al.  The diagnostic properties of rheumatoid arthritis antibodies recognizing a cyclic citrullinated peptide. , 2000, Arthritis and rheumatism.

[9]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.

[10]  Joo Chuan Tong,et al.  SVM-based prediction of linear B-cell epitopes using Bayes Feature Extraction , 2010, BMC Genomics.

[11]  Vasant G Honavar,et al.  Predicting linear B‐cell epitopes using string kernels , 2008, Journal of molecular recognition : JMR.

[12]  Harinder Singh,et al.  Improved Method for Linear B-Cell Epitope Prediction Using Antigen’s Primary Sequence , 2013, PloS one.

[13]  D. Flower,et al.  Benchmarking B cell epitope prediction: Underperformance of existing methods , 2005, Protein science : a publication of the Protein Society.

[14]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[15]  R. L. Baldwin,et al.  N‐ and C‐capping preferences for all 20 amino acids in α‐helical peptides , 1995, Protein science : a publication of the Protein Society.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Vasant Honavar,et al.  Predicting flexible length linear B-cell epitopes. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[18]  S. Lewandowsky PLOS ONE 2013 , 2015 .

[19]  Esben Budtz-Jørgensen,et al.  Confounder selection in environmental epidemiology: assessment of health effects of prenatal mercury exposure. , 2007, Annals of epidemiology.

[20]  Zoran Bursac,et al.  Purposeful selection of variables in logistic regression , 2008, Source Code for Biology and Medicine.

[21]  R. Hodges,et al.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. , 1986, Biochemistry.

[22]  J. Thornton,et al.  Continuous and discontinuous protein antigenic determinants , 1986, Nature.

[23]  E. Emini,et al.  Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide , 1985, Journal of virology.

[24]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..