Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor.

Proteins embody epitopes that serve as their antigenic determinants. Epitopes occupy a central place in integrative biology, not to mention as targets for novel vaccine, pharmaceutical, and systems diagnostics development. The presence of T-cell and B-cell epitopes has been extensively studied due to their potential in synthetic vaccine design. However, reliable prediction of linear B-cell epitope remains a formidable challenge. Earlier studies have reported discrepancy in amino acid composition between the epitopes and non-epitopes. Hence, this study proposed and developed a novel amino acid composition-based feature descriptor, Dipeptide Deviation from Expected Mean (DDE), to distinguish the linear B-cell epitopes from non-epitopes effectively. In this study, for the first time, only exact linear B-cell epitopes and non-epitopes have been utilized for developing the prediction method, unlike the use of epitope-containing regions in earlier reports. To evaluate the performance of the DDE feature vector, models have been developed with two widely used machine-learning techniques Support Vector Machine and AdaBoost-Random Forest. Five-fold cross-validation performance of the proposed method with error-free dataset and dataset from other studies achieved an overall accuracy between nearly 61% and 73%, with balance between sensitivity and specificity metrics. Performance of the DDE feature vector was better (with accuracy difference of about 2% to 12%), in comparison to other amino acid-derived features on different datasets. This study reflects the efficiency of the DDE feature vector in enhancing the linear B-cell epitope prediction performance, compared to other feature representations. The proposed method is made as a stand-alone tool available freely for researchers, particularly for those interested in vaccine design and novel molecular target development for systems therapeutics and diagnostics: https://github.com/brsaran/LBEEP.

[1]  V. Saravanan,et al.  Fuzzy logic for personalized healthcare and diagnostics: FuzzyApp--a fuzzy logic based allergen-protein predictor. , 2014, Omics : a journal of integrative biology.

[2]  Vasant G Honavar,et al.  Predicting linear B‐cell epitopes using string kernels , 2008, Journal of molecular recognition : JMR.

[3]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[4]  SaravananVijayakumar,et al.  Fuzzy logic for personalized healthcare and diagnostics: FuzzyApp--a fuzzy logic based allergen-protein predictor. , 2014 .

[5]  M. V. Van Regenmortel,et al.  What is a B-cell epitope? , 2009, Methods in molecular biology.

[6]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[7]  G. Raghava,et al.  Hybrid Approach for Predicting Coreceptor Used by HIV-1 from Its V3 Loop Amino Acid Sequence , 2013, PloS one.

[8]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[9]  Kareem Carr,et al.  A Rapid Method for Characterization of Protein Relatedness Using Feature Vectors , 2010, PloS one.

[10]  J. Ponomarenko,et al.  35 B-CELL EPITOPE PREDICTION , 2008 .

[11]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[12]  Hao Zhang,et al.  - 1-Support Vector Machines versus Boosting , 2006 .

[13]  Xiaolong Zhang,et al.  Protein structure prediction with local adjust tabu search algorithm , 2014, BMC Bioinformatics.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  M. Black,et al.  Prenatal Exposure to Dexamethasone in the Mouse Alters Cardiac Growth Patterns and Increases Pulse Pressure in Aged Male Offspring , 2013, PloS one.

[16]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[17]  P. Konjevoda,et al.  The role of independent test set in modeling of protein folding kinetics. , 2011, Advances in experimental medicine and biology.

[18]  HuangYing,et al.  CD-HIT Suite , 2010 .

[19]  Marc H V Van Regenmortel,et al.  Immunoinformatics may lead to a reappraisal of the nature of B cell epitopes and of the feasibility of synthetic peptide vaccines , 2006, Journal of molecular recognition : JMR.

[20]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[21]  Xiaowei Zhao,et al.  Conformational B-Cell Epitopes Prediction from Sequences Using Cost-Sensitive Ensemble Classifiers and Spatial Clustering , 2014, BioMed research international.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Saravanan Vijayakumar,et al.  ACPP: A Web Server for Prediction and Design of Anti-cancer Peptides , 2014, International Journal of Peptide Research and Therapeutics.

[24]  Nathan P Croft,et al.  Epitope discovery and their use in peptide based vaccines. , 2010, Current pharmaceutical design.

[25]  M. V. Regenmortel,et al.  What is a B-cell epitope? , 2009 .

[26]  Gajendra PS Raghava,et al.  Identification of B-cell epitopes in an antigen for inducing specific class of antibodies , 2013, Biology Direct.

[27]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[28]  Gilson Luiz Volpato,et al.  Aggressiveness Overcomes Body-Size Effects in Fights Staged between Invasive and Native Fish Species with Overlapping Niches , 2012, PloS one.

[29]  N. Avliyakulov,et al.  Correction: Proteomic Identification of Mitochondrial Targets of Arginase in Human Breast Cancer , 2013, PLoS ONE.

[30]  Meng Ge,et al.  EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression , 2014, BMC Bioinformatics.

[31]  P. Tongaonkar,et al.  A semi‐empirical method for prediction of antigenic determinants on protein antigens , 1990, FEBS letters.

[32]  Alessandro Sette,et al.  The Immune Epitope Database 2.0 , 2009, Nucleic Acids Res..

[33]  Avner Schlessinger,et al.  Epitome: database of structure-inferred antigenic epitopes , 2005, Nucleic Acids Res..

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Vijayakumar Saravanan,et al.  SCLAP: an adaptive boosting method for predicting subchloroplast localization of plant proteins. , 2013, Omics : a journal of integrative biology.

[36]  R. Hodges,et al.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. , 1986, Biochemistry.

[37]  E Westhof,et al.  Predicting location of continuous epitopes in proteins from their primary structures. , 1991, Methods in enzymology.

[38]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[39]  Bo Yao,et al.  SVMTriP: A Method to Predict Antigenic Epitopes Using Support Vector Machine to Integrate Tri-Peptide Similarity and Propensity , 2012, PloS one.

[40]  Eva Liebau,et al.  Identification of major antigenic peptide of filarial glutathione-S-transferase. , 2011, Vaccine.

[41]  Zhenyu Zhang,et al.  Research on AdaBoost.M1 with Random Forest , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[42]  Zbyszek Otwinowski,et al.  Crystal structure and putative function of small Toprim domain‐containing protein from Bacillus stearothermophilus , 2007, Proteins.

[43]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[44]  M. H. Regenmortel,et al.  Immunoinformatics may lead to a reappraisal of the nature of B cell epitopes and of the feasibility of synthetic peptide vaccines. , 2006 .

[45]  LonguespéeRémi,et al.  Tissue Proteomics for the Next Decade? Towards a Molecular Dimension in Histology , 2014 .

[46]  Urmila Kulkarni-Kale,et al.  T-cell epitope prediction methods: an overview. , 2014, Methods in molecular biology.

[47]  A. Pandey,et al.  Brain proteomics of Anopheles gambiae. , 2014, Omics : a journal of integrative biology.

[48]  D. Flower,et al.  Benchmarking B cell epitope prediction: Underperformance of existing methods , 2005, Protein science : a publication of the Protein Society.

[49]  Harinder Singh,et al.  Improved Method for Linear B-Cell Epitope Prediction Using Antigen’s Primary Sequence , 2013, PloS one.

[50]  Sandeep Kumar Dhanda,et al.  Prediction of IL4 Inducing Peptides , 2013, Clinical & developmental immunology.