Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design

BACKGROUND AND OBJECTIVE T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. METHODS In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. RESULTS The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. CONCLUSIONS The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines.

[1]  Irini Doytchinova,et al.  Peptide binding prediction for the human class II MHC allele HLA-DP2: a molecular docking approach , 2011, BMC Structural Biology.

[2]  Claude Beazley,et al.  A Novel Predictive Technique for the MHC Class II Peptide-Binding Interaction , 2003, Molecular medicine.

[3]  Vladimir Brusic,et al.  Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research , 2008, BMC Bioinformatics.

[4]  Vasant Honavar,et al.  On Evaluating MHC-II Binding Peptide Prediction Methods , 2008, PloS one.

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  Morten Nielsen,et al.  MHC Class II epitope predictive algorithms , 2010, Immunology.

[7]  U. Şahin,et al.  Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices , 1999, Nature Biotechnology.

[8]  Morten Nielsen,et al.  Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method , 2007, BMC Bioinformatics.

[9]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[10]  Ulrich H. E. Hansmann,et al.  SMMP v. 3.0 - Simulating proteins and protein interactions in Python and Fortran , 2008, Comput. Phys. Commun..

[11]  Eugene I Shakhnovich,et al.  Structural mining: self-consistent design on flexible protein-peptide docking and transferable binding affinity potential. , 2004, Journal of the American Chemical Society.

[12]  Bjoern Peters,et al.  Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications , 2005, Immunogenetics.

[13]  John Sidney,et al.  A Systematic Assessment of MHC Class II Peptide Binding Predictions and Evaluation of a Consensus Approach , 2008, PLoS Comput. Biol..

[14]  O. Lund,et al.  The Immune Epitope Database and Analysis Resource: From Vision to Blueprint , 2005, PLoS biology.

[15]  R. Germain,et al.  Antigen presentation by MHC class II molecules: invariant chain function, protein trafficking, and the molecular basis of diverse determinant capture. , 1997, Human immunology.

[16]  Ji Wan,et al.  SVRMHC prediction server for MHC-binding peptides , 2006, BMC Bioinformatics.

[17]  J. Boyle Lehninger principles of biochemistry (4th ed.): Nelson, D., and Cox, M. , 2005 .

[18]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[20]  Robert P. Wilson,et al.  Amino acids and proteins , 2003 .

[21]  Tin Wee Tan,et al.  Structural bioinformatics Prediction of HLA-DQ 3 . 2 b Ligands : evidence of multiple registers in class II binding peptides , 2006 .

[22]  Peicho Petkov,et al.  HLA‐DP2 binding prediction by molecular dynamics simulations , 2011, Protein science : a publication of the Protein Society.

[23]  Gajendra P. S. Raghava,et al.  ProPred: prediction of HLA-DR binding sites , 2001, Bioinform..

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  Ignacio Tinoco,et al.  From sequence to structure to function. , 1996, Current opinion in structural biology.

[26]  Bernhard Knapp,et al.  A critical cross-validation of high throughput structural binding prediction methods for pMHC , 2009, J. Comput. Aided Mol. Des..

[27]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[28]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[29]  R. M. Burnett,et al.  Type-specific epitope locations revealed by X-ray crystallographic study of adenovirus type 5 hexon. , 2000, Molecular therapy : the journal of the American Society of Gene Therapy.

[30]  S. Hovmöller,et al.  Conformations of amino acids in proteins. , 2002, Acta crystallographica. Section D, Biological crystallography.

[31]  Oliver Kohlbacher,et al.  Immunoinformatics and epitope prediction in the age of genomic medicine , 2015, Genome Medicine.

[32]  D. Rognan,et al.  Predicting binding affinities of protein ligands from three-dimensional models: application to peptide binding to class I major histocompatibility proteins. , 1999, Journal of medicinal chemistry.

[33]  Mohamed Belal,et al.  ROSS: A rapid protein structure alignment algorithm , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[34]  Edwin Arnold,et al.  Chronic suprapubic catheterization in the management of patients with spinal cord injuries: analysis of upper and lower urinary tract complications , 2008, BJU International.

[35]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[38]  Thomas Lengauer,et al.  DynaPred: A structure and sequence based method for the prediction of MHC class I binding peptide sequences and conformations , 2006, ISMB.

[39]  Arne Elofsson,et al.  Prediction of MHC class I binding peptides, using SVMHC , 2002, BMC Bioinformatics.

[40]  Ying Xu,et al.  Limitations of Ab Initio Predictions of Peptide Binding to MHC Class II Molecules , 2010, PloS one.

[41]  S. Schulze-Kremer,et al.  Genetic algorithms and protein folding. , 2000, Methods in molecular biology.

[42]  M. Slabaugh [Amino acids and proteins]. , 1953, Bulletin de la Societe de chimie biologique.

[43]  B. Meyer,et al.  Group epitope mapping by saturation transfer difference NMR to identify segments of a ligand in direct contact with a protein receptor. , 2001, Journal of the American Chemical Society.

[44]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[45]  C A Floudas,et al.  Computational methods in protein structure prediction. , 2007, Biotechnology and bioengineering.

[46]  Dirk Thierens,et al.  Selection Schemes, Elitist Recombination, and Selection Intensity , 1997, ICGA.

[47]  D. Madden The three-dimensional structure of peptide-MHC complexes. , 1995, Annual review of immunology.

[48]  Michael I. Jordan,et al.  Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model , 2010, PLoS Comput. Biol..

[49]  O. Schueler‐Furman,et al.  Structure‐based prediction of binding peptides to MHC class I molecules: Application to a broad range of MHC alleles , 2000, Protein science : a publication of the Protein Society.

[50]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[51]  Hans-Georg Rammensee,et al.  MHC ligands and peptide motifs: first listing , 2004, Immunogenetics.

[52]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[53]  Morten Nielsen,et al.  NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction , 2009, BMC Bioinformatics.

[54]  Mohamed El-Zeweidy,et al.  EpiGASVM - a New Technique for MHC Class-II Epitope Prediction , 2012 .

[55]  H. Scheraga,et al.  Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to proline-containing peptides , 1994 .

[56]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[57]  Chen Yanover,et al.  Large-scale characterization of peptide-MHC binding landscapes with structural simulations , 2011, Proceedings of the National Academy of Sciences.

[58]  A. Lehninger Principles of Biochemistry , 1984 .

[59]  C. Floudas,et al.  Predicting peptide binding to MHC pockets via molecular modeling, implicit solvation, and global optimization , 2004, Proteins.

[60]  Nashat Mansour,et al.  Enhanced Genetic Algorithm for Protein Structure Prediction based on the HP Model , 2011 .

[61]  Irini A. Doytchinova,et al.  Towards the in silico identification of class II restricted T-cell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction , 2003, Bioinform..