BMC Bioinformatics BioMed Central Methodology article VaxiJen: a server for prediction of protective antigens, tumour

BackgroundVaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties.ResultsBacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen.ConclusionVaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods. It is freely-available online at the URL: http://www.jenner.ac.uk/VaxiJen.

[1]  Lubert Stryer,et al.  Protein structure and function , 2005, Experientia.

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  R. Rappuoli,et al.  Reverse vaccinology: a genome-based approach for vaccine development , 2002, Expert opinion on biological therapy.

[4]  Irini A. Doytchinova,et al.  Towards the chemometric dissection of peptide – HLA-A*0201 binding affinity: comparison of local and global QSAR models , 2005, J. Comput. Aided Mol. Des..

[5]  S. Wold,et al.  Peptide quantitative structure-activity relationships, a multivariate approach. , 1987, Journal of medicinal chemistry.

[6]  Torbjörn Lundstedt,et al.  PREPROCESSING PEPTIDE SEQUENCES FOR MULTIVARIATE SEQUENCE-PROPERTY ANALYSIS , 1998 .

[7]  K. Siebert,et al.  Quantitative structure-activity relationship modeling of peptide and protein behavior as a function of amino acid composition. , 2001, Journal of agricultural and food chemistry.

[8]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[9]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[10]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[11]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[12]  Gary J. Nabel,et al.  New Generation Vaccines , 1990 .

[13]  Torbjörn Lundstedt,et al.  Multivariate Data Analysis of Topographically Modified α‐Melanotropin Analogues using Auto and Cross Auto Covariances (ACC) , 2000 .

[14]  Gregory A.Petsko and Dagmar Ringe Protein structure and function , 2003 .

[15]  Pingping Guan,et al.  Analysis of peptide-protein binding using amino acid descriptors: prediction and experimental verification for human histocompatibility complex HLA-A0201. , 2005, Journal of medicinal chemistry.

[16]  Caleb Webber,et al.  Estimation of P-values for global alignments of protein sequences , 2001, Bioinform..

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  G Gäde,et al.  Mathematical modelling of insect neuropeptide potencies. Are quantitatively predictive models possible? , 2000, Insect biochemistry and molecular biology.

[19]  Stefan Rännar,et al.  Polypeptide sequence property relationships in Escherichia coli based on auto cross covariances , 1995 .

[20]  J. Venter,et al.  Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. , 2000, Science.