Improving reverse vaccinology with a machine learning approach.

Reverse vaccinology aims to accelerate subunit vaccine design by rapidly predicting which proteins in a pathogenic bacterial proteome are putative protective antigens. Support vector machine classification is a machine learning approach that has been applied to solve numerous classification problems in biological sciences but has not previously been incorporated into a reverse vaccinology approach. A training data set of 136 bacterial protective antigens paired with 136 non-antigens was constructed and bioinformatic tools were used to annotate this data for predicted protein features, many of which are associated with antigenicity (i.e. extracellular localization, signal peptides and B-cell epitopes). Annotation was used to train support vector machine classifiers that exhibited a maximum accuracy of 92% for discriminating protective antigens from non-antigens as assessed by a leave-tenth-out cross-validation approach. These accuracies were superior to those achieved when annotating training data with auto and cross covariance transformations of z-descriptors for hydrophobicity, molecular size and polarity, or when classification was performed using regression methods. To further validate support vector machine classifiers, they were used to rank all the proteins in six bacterial proteomes for their antigenicity. Protective antigens from the training data were significantly recalled (enriched) in the top 75 ranked proteins for all six proteomes as assessed by a Fisher's exact test (p<0.05). This paper describes a superior workflow for performing reverse vaccinology studies and provides a benchmark training data set that can be used to evaluate future methodological improvements.

[1]  L. Babiuk Broadening the approaches to developing more effective vaccines. , 1999, Vaccine.

[2]  S. Calderwood,et al.  Transcutaneous Immunization with Toxin-Coregulated Pilin A Induces Protective Immunity against Vibrio cholerae O1 El Tor Challenge in Mice , 2006, Infection and Immunity.

[3]  Gajendra P.S. Raghava,et al.  Prediction of CTL epitopes using QM, SVM and ANN techniques. , 2004, Vaccine.

[4]  Rino Rappuoli,et al.  Reverse vaccinology. , 2000, Current opinion in microbiology.

[5]  J. Mattick,et al.  Identification of vaccine candidate antigens from a genomic analysis of Porphyromonas gingivalis. , 2001, Vaccine.

[6]  Juan Miguel García-Gómez,et al.  Sequence analysis Blast 2 GO : a universal tool for annotation , visualization and analysis in functional genomics research , 2005 .

[7]  Theresa M. Wizemann,et al.  Use of a Whole Genome Approach To Identify Vaccine Molecules Affording Protection against Streptococcus pneumoniae Infection , 2001, Infection and Immunity.

[8]  Fumiyoshi Yamashita,et al.  Two‐ and three‐dimensional QSAR of carrier‐mediated transport of β‐lactam antibiotics in Caco‐2 cells , 2004 .

[9]  R. Rappuoli,et al.  Genome-derived vaccines , 2004, Expert review of vaccines.

[10]  R. Rappuoli,et al.  Two years into reverse vaccinology. , 2003, Vaccine.

[11]  L. Lei,et al.  Klebsiella pneumoniae MrkD adhesin-mediated immunity to respiratory infection and mapping the antigenic epitope by phage display library. , 2009, Microbial pathogenesis.

[12]  J. Reimann,et al.  Priming Th1 Immunity to Viral Core Particles Is Facilitated by Trace Amounts of RNA Bound to Its Arginine-Rich Domain1 , 2002, The Journal of Immunology.

[13]  R. Rappuoli,et al.  Vaccines in the era of genomics: the pneumococcal challenge. , 2007, Vaccine.

[14]  G. Schulz Bacterial porins: structure and function. , 1993, Current opinion in cell biology.

[15]  C. Poh,et al.  Protective Efficacy of DNA Vaccines Encoding Outer Membrane Protein A and OmpK36 of Klebsiella pneumoniae in Mice , 2010, Clinical and Vaccine Immunology.

[16]  Francesco Filippini,et al.  NERVE: New Enhanced Reverse Vaccinology Environment , 2006, BMC biotechnology.

[17]  Bart Baesens,et al.  Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring , 2008, Rule Extraction from Support Vector Machines.

[18]  G. Carlone,et al.  Advances in the development of vaccines against Neisseria meningitidis. , 2010, The New England journal of medicine.

[19]  Irini A. Doytchinova,et al.  BMC Bioinformatics BioMed Central Methodology article VaxiJen: a server for prediction of protective antigens, tumour , 2007 .

[20]  L. Babiuk,et al.  Novel vaccines from biotechnology. , 2005, Revue scientifique et technique.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  S. Wold,et al.  Peptide quantitative structure-activity relationships, a multivariate approach. , 1987, Journal of medicinal chemistry.

[25]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[26]  George Georgiou,et al.  Folding quality control in the export of proteins by the bacterial twin-arginine translocation pathway , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Pingping Guan,et al.  Analysis of peptide-protein binding using amino acid descriptors: prediction and experimental verification for human histocompatibility complex HLA-A0201. , 2005, Journal of medicinal chemistry.

[28]  G. Grandi,et al.  Identification of new potential vaccine candidates against Chlamydia pneumoniae by multiple screenings. , 2005, Vaccine.

[29]  J. Musser,et al.  Identification of new candidate vaccine antigens made by Streptococcus pyogenes: purification and characterization of 16 putative extracellular lipoproteins. , 2004, The Journal of infectious diseases.

[30]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[31]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[32]  D. Richman,et al.  Gene expression before HAART initiation predicts HIV-infected individuals at risk of poor CD4+ T-cell recovery , 2010, AIDS.

[33]  J. Peterson,et al.  Antisera to selected outer membrane proteins of Vibrio cholerae protect against challenge with homologous and heterologous strains of V. cholerae. , 1998, FEMS immunology and medical microbiology.

[34]  Srinivasan Ramachandran,et al.  Computer-aided biotechnology: from immuno-informatics to reverse vaccinology. , 2008, Trends in biotechnology.

[35]  J. Icenogle,et al.  Antigenic and molecular evolution of the vaccine strain of type 3 poliovirus during the period of excretion by a primary vaccinee. , 1986, The Journal of general virology.

[36]  R. Nogarotto,et al.  Genomic Approach for Analysis of Surface Proteins in Chlamydia pneumoniae , 2002, Infection and Immunity.

[37]  N. Ariel,et al.  Search for Bacillus anthracis Potential Vaccine Candidates by a Functional Genomic-Serologic Screen , 2006, Infection and Immunity.

[38]  K. Siebert,et al.  Quantitative structure-activity relationship modeling of peptide and protein behavior as a function of amino acid composition. , 2001, Journal of agricultural and food chemistry.

[39]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[40]  Arthur Thompson,et al.  Unravelling the biology of macrophage infection by gene expression profiling of intracellular Salmonella enterica , 2002, Molecular microbiology.

[41]  A. Meinke,et al.  Bacterial genomes pave the way to novel vaccines. , 2004, Current opinion in microbiology.

[42]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[43]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[44]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[45]  John B. O. Mitchell,et al.  Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction , 2008, Chemistry Central journal.

[46]  R. Gómez,et al.  Whole-genome analysis of Leptospira interrogans to identify potential vaccine candidates against leptospirosis. , 2005, FEMS microbiology letters.

[47]  H. Tettelin,et al.  Identification of a Universal Group B Streptococcus Vaccine by Multiple Genome Screen , 2005, Science.

[48]  W. Schmidt,et al.  Vaccination with poly-L-arginine as immunostimulant for peptide vaccines: induction of potent and long-lasting T-cell responses against cancer antigens. , 2002, Cancer research.

[49]  S. Harris,et al.  Protection against Helicobacter pylori infection by intestinal immunisation with a 50/52-kDa subunit protein. , 1999, FEMS immunology and medical microbiology.

[50]  N. Ariel,et al.  Search for Potential Vaccine Candidate Open Reading Frames in the Bacillus anthracis Virulence Plasmid pXO1: In Silico and In Vitro Screening , 2002, Infection and Immunity.

[51]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[52]  Ling Jing,et al.  Identifying translation initiation sites in prokaryotes using support vector machine. , 2010, Journal of theoretical biology.

[53]  J. Venter,et al.  Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. , 2000, Science.