sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.

[1]  W R Mayr,et al.  Nomenclature for factors of the HLA system, 2004 , 2005, Tissue antigens.

[2]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[3]  Yadi Zhou,et al.  Prediction of Chemical-Protein Interactions Network with Weighted Network-Based Inference Method , 2012, PloS one.

[4]  Feixiong Cheng,et al.  Biomarker-based drug safety assessment in the age of systems pharmacology: from foundational to regulatory science. , 2015, Biomarkers in medicine.

[5]  S. A. Marshall,et al.  Minimizing the immunogenicity of protein therapeutics. , 2004, Drug discovery today.

[6]  Hao Ye,et al.  Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data. , 2016, Environment international.

[7]  O. Lund,et al.  NetMHCpan, a method for MHC class I binding prediction beyond humans , 2008, Immunogenetics.

[8]  J. McCluskey,et al.  Immune self-reactivity triggered by drug-modified HLA-peptide repertoire , 2012, Nature.

[9]  Woncheol Jang,et al.  Permutation test for incomplete paired data with application to cDNA microarray data , 2012, Comput. Stat. Data Anal..

[10]  Weida Tong,et al.  Rat α-Fetoprotein binding affinities of a large set of structurally diverse chemicals elucidated the relationships between structures and binding affinities. , 2012, Chemical research in toxicology.

[11]  Huixiao Hong,et al.  Spec2D: A Structure Elucidation System Based on 1H NMR and H-H COSY Spectra in Organic Chemistry , 2006, J. Chem. Inf. Model..

[12]  Leming Shi,et al.  Molecular docking to identify associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions. , 2015, Combinatorial chemistry & high throughput screening.

[13]  Bin Chen,et al.  Assessing Drug Target Association Using Semantic Linked Data , 2012, PLoS Comput. Biol..

[14]  M. Leboyer,et al.  A double amino-acid change in the HLA-A peptide-binding groove is associated with response to psychotropic treatment in patients with schizophrenia , 2015, Translational Psychiatry.

[15]  Ping Jin,et al.  Polymorphism in clinical immunology – From HLA typing to immunogenetic profiling , 2003, Journal of Translational Medicine.

[16]  Xinquan Xin,et al.  ESSESA: An Expert System for Structure Elucidation from Spectra. Part 3. LNSCS for Chemical Knowledge Representation. , 1992 .

[17]  S. Kondo,et al.  Identification of a naturally processed HLA-Cw7-binding peptide that cross-reacts with HLA-A24-restricted ovarian cancer-specific CTLs. , 2015, Tissue antigens.

[18]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[19]  Huixiao Hong,et al.  ESSESA: an expert system for elucidation of structures from spectra. 1. Knowledge base of infrared spectra and analysis and interpretation programs , 1990, J. Chem. Inf. Comput. Sci..

[20]  Nan Hu,et al.  Decision Forest Analysis of 61 Single Nucleotide Polymorphisms in a Case-Control Study of Esophageal Cancer; a novel method , 2005, BMC Bioinformatics.

[21]  Huixiao Hong,et al.  ESSESA: An expert system for structure elucidation from spectra. 3. LNSCS for chemical knowledge representation , 1992, J. Chem. Inf. Comput. Sci..

[22]  Weida Tong,et al.  Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. , 2015, Chemical research in toxicology.

[23]  H. Fang,et al.  Comparative molecular field analysis (CoMFA) model using a large diverse set of natural, synthetic and environmental chemicals for binding to the androgen receptor , 2003, SAR and QSAR in environmental research.

[24]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[25]  James Robinson,et al.  The IMGT/HLA database , 2008, Nucleic Acids Res..

[26]  Weida Tong,et al.  Human sex hormone-binding globulin binding affinities of 125 structurally diverse chemicals and comparison with their binding to androgen receptor, estrogen receptor, and α-fetoprotein. , 2015, Toxicological sciences : an official journal of the Society of Toxicology.

[27]  Weida Tong,et al.  Estrogenic activity data extraction and in silico prediction show the endocrine disruption potential of bisphenol A replacement compounds. , 2015, Chemical research in toxicology.

[28]  Weida Tong,et al.  The Accurate Prediction of Protein Family from Amino Acid Sequence by Measuring Features of Sequence Fragments , 2009, J. Comput. Biol..

[29]  Weida Tong,et al.  Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data , 2007, Comput. Biol. Chem..

[30]  S. Demaria,et al.  Physical association between the CD8 and HLA class I molecules on the surface of activated human T lymphocytes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Weida Tong,et al.  Multiclass Decision Forest--a novel pattern recognition method for multiclass classification in microarray data analysis. , 2004, DNA and cell biology.

[32]  Marij J P Welters,et al.  Improved peptide vaccine strategies, creating synthetic artificial infections to maximize immune efficacy. , 2006, Advanced drug delivery reviews.

[33]  Pascal Poncet,et al.  MHC class II‐dependent activation of CD4+ T cell hybridomas by human mast cells through superantigen presentation , 1999, Journal of leukocyte biology.

[34]  Morten Nielsen,et al.  NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction , 2009, BMC Bioinformatics.

[35]  T. Schumacher,et al.  Neoantigens in cancer immunotherapy , 2015, Science.

[36]  H Hong,et al.  An in silico ensemble method for lead discovery: decision forest , 2005, SAR and QSAR in environmental research.

[37]  G. Chelvanayagam A roadmap for HLA-A, HLA-B, and HLA-C peptide binding specificities , 1996, Immunogenetics.

[38]  Sneh Lata,et al.  MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes , 2009, BMC Research Notes.

[39]  Jie Li,et al.  Prediction of Polypharmacological Profiles of Drugs by the Integration of Chemical, Side Effect, and Therapeutic Space , 2013, J. Chem. Inf. Model..

[40]  John Trowsdale,et al.  The MHC, disease and selection. , 2011, Immunology letters.

[41]  Mohd Saberi Mohamad,et al.  A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data , 2014 .

[42]  Bettina Wachter,et al.  Cheetah Paradigm Revisited: MHC Diversity in the World's Largest Free-Ranging Population , 2010, Molecular biology and evolution.

[43]  Weida Tong,et al.  Homology modeling, molecular docking, and molecular dynamics simulations elucidated α-fetoprotein binding modes , 2013, BMC Bioinformatics.

[44]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[45]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[46]  O. Lund,et al.  NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ , 2013, Immunogenetics.

[47]  James McCluskey,et al.  Human leukocyte antigen-associated drug hypersensitivity. , 2013, Current opinion in immunology.

[48]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[49]  Mohammad Tuhin Ali,et al.  A Computational Approach for Designing a Universal Epitope-Based Peptide Vaccine Against Nipah Virus , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[50]  Z. Modrušan,et al.  Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing , 2014, Nature.

[51]  D. Wiley,et al.  Refined structure of the human histocompatibility antigen HLA-A2 at 2.6 A resolution. , 1991, Journal of molecular biology.

[52]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Nopporn Kaiyawet,et al.  Molecular Dynamics Simulation Reveals the Selective Binding of Human Leukocyte Antigen Alleles Associated with Behçet's Disease , 2015, PloS one.

[54]  W. Potts,et al.  The Nature of Selection on the Major Histocompatibility Complex. , 2017, Critical reviews in immunology.

[55]  Jie Li,et al.  SDTNBI: an integrated network and chemoinformatics tool for systematic prediction of drug–target interactions and drug repositioning , 2016, Briefings Bioinform..

[56]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[57]  E. Sim,et al.  Phenotyping of human complement component C4, a class-III HLA antigen. , 1986, The Biochemical journal.

[58]  T. Ohta,et al.  Population Biology of Antigen Presentation by MHC Class I Molecules , 1996, Science.

[59]  Morten Nielsen,et al.  NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data , 2011, PloS one.

[60]  Morten Nielsen,et al.  Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers , 2008, Bioinform..

[61]  Morten Nielsen,et al.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system , 2016, Bioinform..

[62]  Weida Tong,et al.  Using Decision Forest to Classify Prostate Cancer Samples on the Basis of SELDI-TOF MS Data: Assessing Chance Correlation and Prediction Confidence , 2004, Environmental health perspectives.

[63]  Alessandro Sette,et al.  The Immune Epitope Database 2.0 , 2009, Nucleic Acids Res..

[64]  Don C. Wiley,et al.  Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide , 1994, Nature.

[65]  Huixiao Hong,et al.  ESSESA: An Expert System for Structure Elucidation from Spectra. 5. Substructure Constraints from Analysis of First-Order 1H-NMR Spectra , 1994, J. Chem. Inf. Comput. Sci..

[66]  Hiroshi Mamitsuka,et al.  Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools , 2011, Briefings Bioinform..

[67]  J. Villadangos,et al.  Intrinsic and cooperative antigen-presenting functions of dendritic-cell subsets in vivo , 2007, Nature Reviews Immunology.

[68]  Jie Shen,et al.  Adverse Drug Events: Database Construction and in Silico Prediction , 2013, J. Chem. Inf. Model..

[69]  Thomas Boehm,et al.  MHC peptides and the sensory evaluation of genotype , 2006, Trends in Neurosciences.

[70]  Zhongming Zhao,et al.  Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes , 2016, Briefings Bioinform..

[71]  William W Kwok,et al.  HLA class II peptide-binding and autoimmunity. , 2002, Tissue antigens.

[72]  Weida Tong,et al.  Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis , 2015, BMC Bioinformatics.

[73]  Ronald E. Bontrop,et al.  Immunogenetics , 2005, Genes and Immunity.

[74]  Jennifer L. Johnson,et al.  Transcriptome Analysis in Domesticated Species: Challenges and Strategies , 2015, Bioinformatics and biology insights.

[75]  Darren R. Flower,et al.  Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores , 2006, BMC Bioinformatics.

[76]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..

[77]  Weida Tong,et al.  Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. , 2001, Environmental health perspectives.

[78]  Jie Li,et al.  Computational prediction of microRNA networks incorporating environmental toxicity and disease etiology , 2014, Scientific Reports.

[79]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[80]  Huixiao Hong,et al.  Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.

[81]  Lei Xu,et al.  The EDKB: an established knowledge base for endocrine disrupting chemicals , 2010, BMC Bioinformatics.

[82]  J. Yewdell,et al.  Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. , 1999, Annual review of immunology.

[83]  Xinquan Xin,et al.  ESSESA, an expert system for structure elucidation from spectral analysis , 1992 .

[84]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[85]  Huixiao Hong,et al.  ESSESA: An Expert System for Structure Elucidation from Spectra. 4. Canonical Representation of Structures , 1994, J. Chem. Inf. Comput. Sci..

[86]  Jie Shen,et al.  Prediction of human genes and diseases targeted by xenobiotics using predictive toxicogenomic-derived models (PTDMs). , 2013, Molecular bioSystems.

[87]  Weida Tong,et al.  EADB: an estrogenic activity database for assessing potential endocrine activity. , 2013, Toxicological sciences : an official journal of the Society of Toxicology.

[88]  Channa K. Hattotuwagama,et al.  AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data , 2005, Immunome research.

[89]  Weida Tong,et al.  Versatility or Promiscuity: The Estrogen Receptors, Control of Ligand Selectivity and an Update on Subtype Selective Ligands , 2014, International journal of environmental research and public health.

[90]  M. Torres,et al.  Nomenclature for factors of the HLA system. , 2011, Bulletin of the World Health Organization.

[91]  Paul M. Allen,et al.  Specificity of T-cell alloreactivity , 2007, Nature Reviews Immunology.

[92]  Weida Tong,et al.  Machine Learning Methods for Predicting HLA–Peptide Binding Activity , 2015, Bioinformatics and biology insights.

[93]  Weida Tong,et al.  Assessing QSAR Limitations - A Regulatory Perspective , 2005 .