Predicting the distance between antibody’s interface residue and antigen to recognize antigen types by support vector machine

In this paper, a machine learning approach, known as support vector machine (SVM) is employed to predict the distance between antibody’s interface residue and antigen in antigen–antibody complex. The heavy chains, light chains and the corresponding antigens of 37 antibodies are extracted from the antibody–antigen complexes in protein data bank. According to different distance ranges, sequence patch sizes and antigen classes, a number of computational experiments are conducted to describe the distance between antibody’s interface residue and antigen with antibody sequence information. The high prediction accuracy of both self-consistent and cross-validation tests indicates that the sequential discovered information from antibody structure characterizes much in predicting the distance between antibody’s interface residue and antigen. Furthermore, the antigen class is predicted from residue composition information that belongs to different distance range by SVM, which shows some potential significance.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  J. N. Varghese,et al.  Three-dimensional structure of a complex of antibody with influenza virus neuraminidase , 1987, Nature.

[3]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[4]  R L Stanfield,et al.  Crystal structures of an antibody to a peptide and its complex with peptide antigen at 2.8 A. , 1992, Science.

[5]  T. N. Bhat,et al.  Small rearrangements in structures of Fv and Fab fragments of antibody D 1.3 on antigen binding , 1990, Nature.

[6]  A M Lesk,et al.  Structural repertoire of the human VH segments. , 1992, Journal of molecular biology.

[7]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[8]  D. Webster,et al.  Antibody design: beyond the natural limits. , 1994, Trends in biotechnology.

[9]  D. Webster,et al.  Antibody-antigen interactions , 1994 .

[10]  J. Xiang,et al.  Complementarity determining region residues aspartic acid at H55, serine at H95 and tyrosines at H97 and L96 play important roles in the B72.3 antibody-TAG72 antigen interaction. , 1996, Protein engineering.

[11]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[13]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[14]  Y. Iba,et al.  Changes in the specificity of antibodies against steroid antigens by introduction of mutations into complementarity-determining regions of the V(H) domain. , 1998, Protein engineering.

[15]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[16]  S Pascarella,et al.  Easy method to predict solvent accessibility from multiple protein sequence alignments , 1998, Proteins.

[17]  R. Casadio,et al.  A neural network based predictor of residue contacts in proteins. , 1999, Protein engineering.

[18]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[19]  Paul S. Bradley,et al.  Mathematical Programming for Data Mining: Formulations and Challenges , 1999, INFORMS J. Comput..

[20]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[23]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[24]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[25]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[26]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[27]  Vladimir Brusic,et al.  Computational immunology: The coming of age , 2002, Immunology and cell biology.

[28]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[29]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[30]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[31]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[32]  Hui Lu,et al.  Development of unified statistical potentials describing protein-protein interactions. , 2003, Biophysical journal.

[33]  Yi Peng,et al.  Multiple criteria linear programming approach to data mining: Models, algorithm designs and software development , 2003, Optim. Methods Softw..

[34]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[35]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[37]  Vasant Honavar,et al.  Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach , 2004, Neural Computing & Applications.

[38]  Jianping Li,et al.  Support Vector Machines Approach to Credit Assessment , 2004, International Conference on Computational Science.

[39]  Gang Kou,et al.  Classification of HIV-I-Mediated neuronal dendritic and synaptic damage using multiple criteria linear programming , 2007, Neuroinformatics.