AllerTOP - a server for in silico prediction of allergens

BackgroundAllergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.ResultsA set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (k NN). The best performing model was derived by k NN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.ConclusionsAllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin.

[1]  Gajendra P. S. Raghava,et al.  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes , 2006, Nucleic Acids Res..

[2]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[3]  Joo Chuan Tong,et al.  AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins , 2007, Bioinform..

[4]  Z. Cao,et al.  Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. , 2007, Molecular immunology.

[5]  C. Janeway Immunobiology: The Immune System in Health and Disease , 1996 .

[6]  V Brusic,et al.  Computational tools for the study of allergens , 2003, Allergy.

[7]  Torbjörn Lundstedt,et al.  Multivariate Data Analysis of Topographically Modified α‐Melanotropin Analogues using Auto and Cross Auto Covariances (ACC) , 2000 .

[8]  C. Schein,et al.  Molego‐based definition of the architecture and specificity of metal‐binding sites , 2004, Proteins.

[9]  Werner Braun,et al.  Identifying Property Based Sequence Motifs in Protein Families and Superfamilies: Application to DNase-1 Related Endonucleases , 2003, Bioinform..

[10]  I Kimber,et al.  Why are some proteins allergens? , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[11]  Irini A. Doytchinova,et al.  BMC Bioinformatics BioMed Central Methodology article VaxiJen: a server for prediction of protective antigens, tumour , 2007 .

[12]  Roeland C. H. J. van Ham,et al.  Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines , 2004, BMC Bioinformatics.

[13]  Jing Wang,et al.  Evaluation and integration of existing methods for computational prediction of allergens , 2013, BMC Bioinformatics.

[14]  Fabian Glaser,et al.  An attempt to define allergen-specific molecular surface features: a bioinformatic approach , 2005, Bioinform..

[15]  R. Davies,et al.  ABC of allergies: Diagnosing allergy , 1998, BMJ.

[16]  P J Cooper,et al.  Intestinal worms and human allergy , 2004, Parasite immunology.

[17]  Arun Krishnan,et al.  Predicting allergenic proteins using wavelet transform , 2004, Bioinform..

[18]  Joo Chuan Tong,et al.  AllerHunter: A SVM-Pairwise System for Assessment of Allergenicity and Allergic Cross-Reactivity in Proteins , 2009, PloS one.

[19]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[20]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[21]  Werner Braun,et al.  Statistical analysis of physical-chemical properties and prediction of protein-protein interfaces , 2007, Journal of molecular modeling.

[22]  S. Wold,et al.  Peptide quantitative structure-activity relationships, a multivariate approach. , 1987, Journal of medicinal chemistry.

[23]  Polly Matzinger,et al.  Hydrophobicity: an ancient damage-associated molecular pattern that initiates innate immune responses , 2004, Nature Reviews Immunology.

[24]  Thirty-Second Session JOINT FAO/WHO FOOD STANDARDS PROGRAMME , 2007 .

[25]  P. Norman,et al.  Immunobiology: The immune system in health and disease , 1995 .

[26]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Mats G. Gustafsson,et al.  Prediction of food protein allergenicity: a bioinformatic learning systems approach , 2002, Silico Biol..

[28]  C. Schein,et al.  Stereophysicochemical variability plots highlight conserved antigenic areas in Flaviviruses , 2005, Virology Journal.

[29]  C. Emanuelsson,et al.  Allergens as eukaryotic proteins lacking bacterial homologues. , 2007, Molecular immunology.

[30]  S. Durham,et al.  ABC of allergies , 1998 .

[31]  Torbjörn Lundstedt,et al.  PREPROCESSING PEPTIDE SEQUENCES FOR MULTIVARIATE SEQUENCE-PROPERTY ANALYSIS , 1998 .

[32]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[33]  Daniel Soeria-Atmadja,et al.  Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins , 2005, Bioinform..

[34]  Werner Braun,et al.  Robust quantitative modeling of peptide binding affinities for MHC molecules using physical-chemical descriptors. , 2007, Protein and peptide letters.

[35]  Werner Braun,et al.  Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases , 2002, BMC Bioinformatics.

[36]  Darren R. Flower,et al.  AllerTOP v.2—a server for in silico prediction of allergens , 2014, Journal of Molecular Modeling.

[37]  Werner Braun,et al.  SDAP: database and computational tools for allergenic proteins , 2003, Nucleic Acids Res..

[38]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[39]  Frederick Miller,et al.  Immunobiology: The Immune System in Health and Disease. Charles A. Janeway, Jr. , Paul Travers , Mark Walport , J. Donald Capra , 2000 .

[40]  Michael B. Stadler,et al.  Allergenicity prediction by protein sequence , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[41]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.