Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior

BackgroundExperts in peptide:MHC binding studies are often able to estimate the impact of a single residue substitution based on a heuristic understanding of amino acid similarity in an experimental context. Our aim is to quantify this measure of similarity to improve peptide:MHC binding prediction methods. This should help compensate for holes and bias in the sequence space coverage of existing peptide binding datasets.ResultsHere, a novel amino acid similarity matrix (PMBEC) is directly derived from the binding affinity data of combinatorial peptide mixtures. Like BLOSUM62, this matrix captures well-known physicochemical properties of amino acid residues. However, PMBEC differs markedly from existing matrices in cases where residue substitution involves a reversal of electrostatic charge. To demonstrate its usefulness, we have developed a new peptide:MHC class I binding prediction method, using the matrix as a Bayesian prior. We show that the new method can compensate for missing information on specific residues in the training data. We also carried out a large-scale benchmark, and its results indicate that prediction performance of the new method is comparable to that of the best neural network based approaches for peptide:MHC class I binding.ConclusionA novel amino acid similarity matrix has been derived for peptide:MHC binding interactions. One prominent feature of the matrix is that it disfavors substitution of residues with opposite charges. Given that the matrix was derived from experimentally determined peptide:MHC binding affinity measurements, this feature is likely shared by all peptide:protein interactions. In addition, we have demonstrated the usefulness of the matrix as a Bayesian prior in an improved scoring-matrix based peptide:MHC class I prediction method. A software implementation of the method is available at: http://www.mhc-pathway.net/smmpmbec.

[1]  S Brunak,et al.  Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach. , 2003, Tissue antigens.

[2]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[3]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[4]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[5]  Clemencia Pinilla,et al.  Characterization of the peptide-binding specificity of the chimpanzee class I alleles A*0301 and A*0401 using a combinatorial peptide library , 2007, Immunogenetics.

[6]  Morten Nielsen,et al.  Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers , 2008, Bioinform..

[7]  V. Brusic,et al.  Evaluation of MHC class I peptide binding prediction servers: Applications for vaccine research , 2008, BMC Immunology.

[8]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[9]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[10]  R A Goldstein,et al.  Context-dependent optimal substitution matrices. , 1995, Protein engineering.

[11]  R A Houghten,et al.  Rapid identification of high affinity peptide ligands using positional scanning synthetic peptide combinatorial libraries. , 1992, BioTechniques.

[12]  Morten Nielsen,et al.  The validity of predicted T-cell epitopes. , 2006, Trends in biotechnology.

[13]  A Sette,et al.  Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertype molecules. , 2001, Human immunology.

[14]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[15]  Morten Nielsen,et al.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11 , 2008, Nucleic Acids Res..

[16]  Bjoern Peters,et al.  Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries , 2008, Immunome research.

[17]  Alessandro Sette,et al.  Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method , 2005, BMC Bioinformatics.

[18]  Morten Nielsen,et al.  A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules , 2006, PLoS Comput. Biol..

[19]  Andrzej Kloczkowski,et al.  Ideal amino acid exchange forms for approximating substitution matrices , 2007, Proteins.

[20]  S A Benner,et al.  Amino acid substitution during functionally constrained divergent evolution of protein sequences. , 1994, Protein engineering.

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[23]  John Sidney,et al.  Classification of A1- and A24-supertype molecules by analysis of their MHC-peptide binding repertoires , 2005, Immunogenetics.

[24]  Bjoern Peters,et al.  Identifying MHC Class I Epitopes by Predicting the TAP Transport Efficiency of Epitope Precursors , 2003, The Journal of Immunology.

[25]  M Kann,et al.  Optimization of a new score function for the detection of remote homologs , 2000, Proteins.

[26]  Bjoern Peters,et al.  A Detailed Analysis of the Murine TAP Transporter Substrate Specificity , 2008, PloS one.

[27]  John P. Overington,et al.  Environment‐specific amino acid substitution tables: Tertiary templates and prediction of protein folds , 1992, Protein science : a publication of the Protein Society.

[28]  John Sidney,et al.  Measurement of MHC/Peptide Interactions by Gel Filtration , 1999, Current protocols in immunology.