Determinants of antigenicity and specificity in immune response for protein sequences

BackgroundTarget specific antibodies are pivotal for the design of vaccines, immunodiagnostic tests, studies on proteomics for cancer biomarker discovery, identification of protein-DNA and other interactions, and small and large biochemical assays. Therefore, it is important to understand the properties of protein sequences that are important for antigenicity and to identify small peptide epitopes and large regions in the linear sequence of the proteins whose utilization result in specific antibodies.ResultsOur analysis using protein properties suggested that sequence composition combined with evolutionary information and predicted secondary structure, as well as solvent accessibility is sufficient to predict successful peptide epitopes. The antigenicity and the specificity in immune response were also found to depend on the epitope length. We trained the B-Cell Epitope Oracle (BEOracle), a support vector machine (SVM) classifier, for the identification of continuous B-Cell epitopes with these protein properties as learning features. The BEOracle achieved an F1-measure of 81.37% on a large validation set. The BEOracle classifier outperformed the classical methods based on propensity and sophisticated methods like BCPred and Bepipred for B-Cell epitope prediction. The BEOracle classifier also identified peptides for the ChIP-grade antibodies from the modENCODE/ENCODE projects with 96.88% accuracy. High BEOracle score for peptides showed some correlation with the antibody intensity on Immunofluorescence studies done on fly embryos. Finally, a second SVM classifier, the B-Cell Region Oracle (BROracle) was trained with the BEOracle scores as features to predict the performance of antibodies generated with large protein regions with high accuracy. The BROracle classifier achieved accuracies of 75.26-63.88% on a validation set with immunofluorescence, immunohistochemistry, protein arrays and western blot results from Protein Atlas database.ConclusionsTogether our results suggest that antigenicity is a local property of the protein sequences and that protein sequence properties of composition, secondary structure, solvent accessibility and evolutionary conservation are the determinants of antigenicity and specificity in immune response. Moreover, specificity in immune response could also be accurately predicted for large protein regions without the knowledge of the protein tertiary structure or the presence of discontinuous epitopes. The dataset prepared in this work and the classifier models are available for download at https://sites.google.com/site/oracleclassifiers/.

[1]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[2]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[3]  J. Janin,et al.  Computer analysis of protein-protein interaction. , 1978, Journal of molecular biology.

[4]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[5]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[6]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[7]  Pierre Baldi,et al.  COBEpro: a novel system for predicting continuous B-cell epitopes. , 2009, Protein engineering, design & selection : PEDS.

[8]  Irini A. Doytchinova,et al.  JenPep: A Novel Computational Information Resource for Immunobiology and Vaccinology , 2003, J. Chem. Inf. Comput. Sci..

[9]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.

[10]  Gajendra P. S. Raghava,et al.  BcePred: Prediction of Continuous B-Cell Epitopes in Antigenic Sequences Using Physico-chemical Properties , 2004, ICARIS.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[13]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  Alessandro Sette,et al.  The Immune Epitope Database 2.0 , 2009, Nucleic Acids Res..

[16]  P. Tongaonkar,et al.  A semi‐empirical method for prediction of antigenic determinants on protein antigens , 1990, FEBS letters.

[17]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[18]  Urmila Kulkarni-Kale,et al.  CEP: a conformational epitope prediction server , 2005, Nucleic Acids Res..

[19]  Irini A. Doytchinova,et al.  JenPep: a database of quantitative functional peptide data for immunology , 2002, Bioinform..

[20]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[21]  Jean-Luc Pellequer,et al.  BEPITOPE: predicting the location of continuous epitopes and patterns in proteins , 2003, Journal of molecular recognition : JMR.

[22]  A. Alix,et al.  Predictive estimation of protein linear epitopes by using the program PEOPLE. , 1999, Vaccine.

[23]  D. Flower,et al.  Benchmarking B cell epitope prediction: Underperformance of existing methods , 2005, Protein science : a publication of the Protein Society.

[24]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[25]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[26]  Morten Nielsen,et al.  Improved method for predicting linear B-cell epitopes , 2006, Immunome research.

[27]  Bernd Mayer,et al.  Machine learning approaches for prediction of linear B‐cell epitopes on proteins , 2006, Journal of molecular recognition : JMR.

[28]  BMC Bioinformatics , 2005 .

[29]  E Westhof,et al.  Correlation between the location of antigenic sites and the prediction of turns in proteins. , 1993, Immunology letters.

[30]  Sudipto Saha,et al.  Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network , 2006, Proteins.

[31]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[32]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[33]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[34]  M. Bhasin,et al.  Bcipep: A database of B-cell epitopes , 2005, BMC Genomics.

[35]  R. Hodges,et al.  New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. , 1986, Biochemistry.

[36]  E. Emini,et al.  Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide , 1985, Journal of virology.

[37]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[38]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[39]  O. Lund,et al.  Prediction of residues in discontinuous B‐cell epitopes using protein 3D structures , 2006, Protein science : a publication of the Protein Society.

[40]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Vasant Honavar,et al.  Recent advances in B-cell epitope prediction methods , 2010, Immunome research.

[42]  Vasant G Honavar,et al.  Predicting linear B‐cell epitopes using string kernels , 2008, Journal of molecular recognition : JMR.

[43]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.