Vaxign-ML: supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens

MOTIVATION Reverse vaccinology (RV) is a milestone in rational vaccine design, and machine learning (ML) has been applied to enhance the accuracy of RV prediction. However, ML-based RV still faces the challenges in prediction accuracy and program accessibility. RESULTS This study presents Vaxign-ML, a supervised ML classification to predict bacterial protective antigens. To identify the best ML method with optimized conditions, five ML methods were tested with biological and physiochemical features extracted from well-defined training data. Nested five-fold cross-validation and leave-one-pathogen-out validation were used to ensure unbiased performance assessment and the capability to predict vaccine candidates against a new emerging pathogen. The best performing model, Vaxign-ML, was compared to three publicly available RV programs with a high-quality benchmark dataset. Vaxign-ML showed superior performance in predicting bacterial protective antigens. Vaxign-ML is deployed in a publicly available web server. AVAILABILITY Vaxign-ML website at http://www.violinet.org/vaxign/vaxign-ml. Docker standalone Vaxign-ML available at https://hub.docker.com/r/e4ong1031/vaxign-ml and source code is available at https://github.com/VIOLINet/Vaxign-ML-docker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Jamil Ahmad,et al.  VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology , 2017, BMC Bioinformatics.

[2]  Srinivasan Ramachandran,et al.  SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks , 2004, Bioinform..

[3]  Mahmoud Torabinejad,et al.  Cytotoxicity and Antimicrobial Effects of a New Fast-Set MTA , 2017, BioMed research international.

[4]  Kenneth J. Linthicum,et al.  Global Disease Outbreaks Associated with the 2015–2016 El Niño Event , 2019, Scientific Reports.

[5]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[6]  Yongqun He,et al.  Vaxign: The First Web-Based Vaccine Design Program for Reverse Vaccinology and Applications for Vaccine Development , 2010, Journal of biomedicine & biotechnology.

[7]  Rino Rappuoli,et al.  Comparison of Open-Source Reverse Vaccinology Programs for Bacterial Vaccine Antigen Discovery , 2019, Front. Immunol..

[8]  Irini Doytchinova,et al.  T-cell epitope vaccine design by immunoinformatics , 2013, Open Biology.

[9]  Paul J. Kennedy,et al.  A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms , 2013, BMC Bioinformatics.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  J M Blasco,et al.  Vaccination with Brucella abortus rough mutant RB51 protects BALB/c mice against virulent strains of Brucella abortus, Brucella melitensis, and Brucella ovis , 1994, Infection and immunity.

[12]  Hongwei Liu,et al.  Using the multi-objective optimization replica exchange Monte Carlo enhanced sampling method for protein–small molecule docking , 2017, BMC Bioinformatics.

[13]  Morten Nielsen,et al.  IEDB-AR: immune epitope database—analysis resource in 2019 , 2019, Nucleic Acids Res..

[14]  Ravi V. Kolla,et al.  Comprehensive analysis of dengue virus-specific responses supports an HLA-linked protective role for CD8+ T cells , 2013, Proceedings of the National Academy of Sciences.

[15]  C. Zhang,et al.  Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids , 2000, Journal of protein chemistry.

[16]  Zhi-Wei Cao,et al.  Efficacy of different protein descriptors in predicting protein functional families , 2007, BMC Bioinformatics.

[17]  Yongqun He,et al.  Bioinformatics analysis of bacterial protective antigens in manually curated Protegen database , 2012 .

[18]  Prince Sharma,et al.  Immunoprotective Efficacy of Acinetobacter baumannii Outer Membrane Protein, FilF, Predicted In silico as a Potential Vaccine Candidate , 2016, Front. Microbiol..

[19]  Bryan Lingard,et al.  Analysis of Known Bacterial Protein Vaccine Antigens Reveals Biased Physical Properties and Amino Acid Composition , 2003, Comparative and functional genomics.

[20]  J. Warwicker,et al.  Web-based display of protein surface and pH-dependent properties for assessing the developability of biotherapeutics , 2019, Scientific Reports.

[21]  L. Rubin,et al.  Use of Serogroup B Meningococcal Vaccines in Persons Aged ≥10 Years at Increased Risk for Serogroup B Meningococcal Disease: Recommendations of the Advisory Committee on Immunization Practices, 2015 , 2015, MMWR. Morbidity and mortality weekly report.

[22]  R. Zhang,et al.  Peptide Amphiphile Micelle Vaccine Size and Charge Influence the Host Antibody Response. , 2018, ACS biomaterials science & engineering.

[23]  Prediction of Epitopes in the Proteome of Helicobacter pylori , 2018, Global Journal of Health Science.

[24]  Irini A. Doytchinova,et al.  BMC Bioinformatics BioMed Central Methodology article VaxiJen: a server for prediction of protective antigens, tumour , 2007 .

[25]  R. Sokal,et al.  Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. , 2006, American journal of physical anthropology.

[26]  Leonard Moise,et al.  iVAX: An integrated toolkit for the selection and optimization of antigens and the design of epitope-driven vaccines , 2015, Human vaccines & immunotherapeutics.

[27]  D. Moss,et al.  Are bacterial vaccine antigens T-cell epitope depleted? , 2008, Trends in immunology.

[28]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[29]  Yongqun He,et al.  Protegen: a web-based protective antigen database and analysis system , 2010, Nucleic Acids Res..

[30]  J. Ting,et al.  Controlled analysis of nanoparticle charge on mucosal and systemic antibody responses following pulmonary immunization , 2014, Proceedings of the National Academy of Sciences.

[31]  A. Azad,et al.  Computational Identification and Characterization of a Promiscuous T-Cell Epitope on the Extracellular Protein 85B of Mycobacterium spp. for Peptide-Based Subunit Vaccine Design , 2017, BioMed research international.

[32]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[33]  Yongqun He,et al.  Identification of New Features from Known Bacterial Protective Vaccine Antigens Enhances Rational Vaccine Design , 2017, Front. Immunol..

[34]  V. Brusic,et al.  Proteins accessible to immune surveillance show significant T-cell epitope depletion: Implications for vaccine design. , 2009, Molecular immunology.

[35]  J. Venter,et al.  Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. , 2000, Science.

[36]  Bjoern Peters,et al.  The Immune Epitope Database and Analysis Resource in Epitope Discovery and Synthetic Vaccine Design , 2017, Front. Immunol..

[37]  Matthew N Davies,et al.  Computer aided selection of candidate vaccine antigens , 2010, Immunome research.

[38]  Francesco Filippini,et al.  NERVE: New Enhanced Reverse Vaccinology Environment , 2006, BMC biotechnology.

[39]  Mohammad Sohel Rahman,et al.  Antigenic: An improved prediction model of protective antigens , 2019, Artif. Intell. Medicine.

[40]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[41]  Mahesan Niranjan,et al.  Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology , 2017, International journal of molecular sciences.

[42]  I. Rasooli,et al.  An in silico chimeric multi subunit vaccine targeting virulence factors of enterotoxigenic Escherichia coli (ETEC) with its bacterial inbuilt adjuvant. , 2012, Journal of microbiological methods.

[43]  C. Yun,et al.  Mechanisms of the Regulation of the Intestinal Na+/H+Exchanger NHE3 , 2009, Journal of biomedicine & biotechnology.

[44]  Ankit Gupta,et al.  Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions , 2013, BMC Bioinformatics.

[45]  Faramarz Valafar,et al.  Improving reverse vaccinology with a machine learning approach. , 2011, Vaccine.

[46]  F. Crauste,et al.  Model-Based Assessment of the Role of Uneven Partitioning of Molecular Content on Heterogeneity and Regulation of Differentiation in CD8 T-Cell Immune Responses , 2018, bioRxiv.

[47]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[48]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Pierre Baldi,et al.  High-throughput prediction of protein antigenicity using protein microarray data , 2010, Bioinform..

[50]  Bjoern Peters,et al.  Functional classification of class II human leukocyte antigen (HLA) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes , 2011, Immunogenetics.

[51]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[52]  W. van Eden,et al.  T Cell-Mediated Chronic Inflammatory Diseases Are Candidates for Therapeutic Tolerance Induction with Heat Shock Proteins , 2017, Front. Immunol..

[53]  R. Rappuoli Reverse vaccinology : Genomics , 2000 .

[54]  X M Pan,et al.  Accurate Prediction of Protein Secondary Structural Content , 2001, Journal of protein chemistry.