USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

Background Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics. Results We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data. Conclusions We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.

[1]  Morten Nielsen,et al.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system , 2016, Bioinform..

[2]  Clemencia Pinilla,et al.  Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior , 2009, BMC Bioinformatics.

[3]  Morten Nielsen,et al.  Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions , 2014, BMC Bioinformatics.

[4]  Junjie Chen,et al.  A comprehensive review and comparison of different computational methods for protein remote homology detection , 2018, Briefings Bioinform..

[5]  Diana Tichy,et al.  Performance Evaluation of MHC Class-I Binding Prediction Tools Based on an Experimentally Validated MHC–Peptide Binding Data Set , 2019, Cancer Immunology Research.

[6]  Morten Nielsen,et al.  Peptide binding predictions for HLA DR, DP and DQ molecules , 2010, BMC Bioinformatics.

[7]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[8]  Kyung Soo Park,et al.  Engineering patient-specific cancer immunotherapies , 2019, Nature Biomedical Engineering.

[9]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[10]  O. Lund,et al.  The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage , 2005, Immunogenetics.

[11]  Magdalini Moutaftsi,et al.  A consensus epitope prediction approach identifies the breadth of murine TCD8+-cell responses to vaccinia virus , 2006, Nature Biotechnology.

[12]  Bjoern Peters,et al.  HLA Class I Alleles Are Associated with Peptide-Binding Repertoires of Different Size, Affinity, and Immunogenicity , 2013, The Journal of Immunology.

[13]  Weilong Zhao,et al.  Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes , 2018, PLoS Comput. Biol..

[14]  Collin Tokheim,et al.  Evaluation of machine learning methods to predict peptide binding to MHC Class I proteins , 2017, bioRxiv.

[15]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[16]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[17]  Morten Nielsen,et al.  NetMHCpan 4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data , 2017, bioRxiv.

[18]  Rachel Karchin,et al.  Prediction of peptide binding to MHC Class I proteins in the age of deep learning , 2017 .

[19]  Alex Rubinsteyn,et al.  MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. , 2018, Cell systems.

[20]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[21]  Jiangning Song,et al.  A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction , 2020, Briefings Bioinform..

[22]  Alessandro Sette,et al.  The Immune Epitope Database (IEDB): 2018 update , 2018, Nucleic Acids Res..

[23]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Wojciech Samek,et al.  UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.

[26]  Ekapol Chuangsuwanich,et al.  MHCSeqNet: A deep neural network model for universal MHC binding prediction , 2018 .