DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction

Interactions between human leukocyte antigens (HLAs) and peptides play a critical role in the human immune system. Accurate computational prediction of HLA-binding peptides can be used for peptide drug discovery. Currently, the best prediction algorithms are neural network-based pan-specific models, which take advantage of the large amount of data across HLA alleles. However, current pan-specific models are all based on the pseudo sequence encoding for modeling the binding context, which is based on 34 positions identified from the HLA protein-peptide bound structures in early works. In this work, we proposed a novel deep convolutional neural network model (DCNN) for HLA-peptide binding prediction, in which the encoding of the HLA sequence and the binding context are both learned by the network itself without requiring the HLA-peptide bound structure information. Our DCNN model is also characterized by its binding context extraction layer and dual outputs with both binding affinity output and binding probability outputs. Evaluation on public benchmark datasets shows that our DeepSeqPan model without HLA structural information in training achieves state-of-the-art performance on a large number of HLA alleles with good generalization capability. Since our model only needs raw sequences from the HLA-peptide binding pairs, it can be applied to binding predictions of HLAs without structure information and can also be applied to other protein binding problems such as protein-DNA and protein-RNA bindings. The implementation code and trained models are freely available at https://github.com/pcpLiu/DeepSeqPan.

[1]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Morten Nielsen,et al.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system , 2016, Bioinform..

[3]  James Robinson,et al.  The IPD and IMGT/HLA database: allele variant databases , 2014, Nucleic Acids Res..

[4]  Alex Rubinsteyn,et al.  MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. , 2018, Cell systems.

[5]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[6]  Hiroshi Mamitsuka,et al.  Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools , 2011, Briefings Bioinform..

[7]  Alessandro Sette,et al.  Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method , 2005, BMC Bioinformatics.

[8]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[9]  Morten Nielsen,et al.  Quantitative Predictions of Peptide Binding to Any HLA-DR Molecule of Known Sequence: NetMHCIIpan , 2008, PLoS Comput. Biol..

[10]  Morten Nielsen,et al.  NetMHCcons: a consensus method for the major histocompatibility complex class I predictions , 2011, Immunogenetics.

[11]  Jian Wang,et al.  PSSMHCpan: a novel PSSM-based software for predicting class I peptide-HLA binding affinity , 2017, GigaScience.

[12]  Alex Rubinsteyn,et al.  MHCflurry: open-source class I MHC binding affinity prediction , 2017, bioRxiv.

[13]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[14]  Bjoern Peters,et al.  HLA Class I Alleles Are Associated with Peptide-Binding Repertoires of Different Size, Affinity, and Immunogenicity , 2013, The Journal of Immunology.

[15]  Xiaohui Xie,et al.  HLA class I binding prediction via convolutional neural networks , 2017, bioRxiv.

[16]  Dongsup Kim,et al.  Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction , 2017, BMC Bioinformatics.

[17]  Morten Nielsen,et al.  The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding , 2009, Bioinform..

[18]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  O. Lund,et al.  NetMHCpan, a method for MHC class I binding prediction beyond humans , 2008, Immunogenetics.

[21]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[22]  Morten Nielsen,et al.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11 , 2008, Nucleic Acids Res..

[23]  M. Nielsen,et al.  NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets , 2016, Genome Medicine.

[24]  Hao Ye,et al.  sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides , 2016, Scientific Reports.

[25]  John-William Sidhom,et al.  AI-MHC: an allele-integrated deep learning framework for improving Class I & Class II HLA-binding predictions , 2018, bioRxiv.

[26]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[27]  Jianjun Hu,et al.  DeepMHC: Deep Convolutional Neural Networks for High-performance peptide-MHC Binding Affinity Prediction , 2017, bioRxiv.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sneh Lata,et al.  MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes , 2009, BMC Research Notes.

[30]  Morten Nielsen,et al.  Automated benchmarking of peptide-MHC class I binding predictions , 2015, Bioinform..