Learning MHC I - peptide binding

MOTIVATION AND RESULTS Motivated by the ability of a simple threading approach to predict MHC I--peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system. AVAILABILITY http://www.research.microsoft.com/~jojic/hlaBinding.html.

[1]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[2]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[3]  J. Sidney,et al.  Peptide binding to the most frequent HLA-A class I alleles measured by quantitative molecular binding assays. , 1994, Molecular immunology.

[4]  Tomer Hertz,et al.  Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions , 2005, RECOMB.

[5]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[6]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[7]  David Heckerman,et al.  Leveraging Information Across HLA Alleles/Supertypes Improves Epitope Prediction , 2006, RECOMB.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Gajendra P. S. Raghava,et al.  MHCBN: a comprehensive database of MHC binding and non-binding peptides , 2003, Bioinform..

[10]  Morten Nielsen,et al.  A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules , 2006, PLoS Comput. Biol..

[11]  E. Arts,et al.  Replicative fitness of historical and recent HIV-1 isolates suggests HIV-1 attenuation over time , 2005, AIDS.

[12]  C. Moore,et al.  Evidence of HIV-1 Adaptation to HLA-Restricted Immune Responses at a Population Level , 2002, Science.

[13]  O. Schueler‐Furman,et al.  Structure‐based prediction of binding peptides to MHC class I molecules: Application to a broad range of MHC alleles , 2000, Protein science : a publication of the Protein Society.

[14]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[15]  Brendan J. Frey,et al.  Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails , 2005, NIPS 2005.

[16]  D. Wiley,et al.  The antigenic identity of peptide-MHC complexes: A comparison of the conformations of five viral peptides presented by HLA-A2 , 1993, Cell.

[17]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[18]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .