Prediction of Carbohydrate Binding Sites on Protein Surfaces with 3-Dimensional Probability Density Distributions of Interacting Atoms

Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.

[1]  Takashi Yamane,et al.  An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins. , 2003, Protein engineering.

[2]  Mahesh Kulharia,et al.  InCa-SiteFinder: a method for structure-based prediction of inositol and carbohydrate binding sites on proteins. , 2009, Journal of molecular graphics & modelling.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  B. McConkey,et al.  Discrimination of native protein structures using atom–atom contact scoring , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Carlito B. Lebrilla,et al.  The prospects of glycan biomarkers for the diagnosis of diseases. , 2009, Molecular bioSystems.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  J M Thornton,et al.  X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins. , 1996, Journal of molecular biology.

[8]  N. Vyas Atomic features of protein-carbohydrate interactions , 1991 .

[9]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[10]  F A Quiocho,et al.  Carbohydrate-binding proteins: tertiary structures and protein-sugar interactions. , 1986, Annual review of biochemistry.

[11]  Wen-Lian Hsu,et al.  Protease substrate site predictors derived from machine learning on multilevel substrate phage display data , 2008, Bioinform..

[12]  F. Quiocho Protein-carbohydrate interactions: basic molecular features , 1989 .

[13]  W. Weis,et al.  Structural basis of lectin-carbohydrate recognition. , 1996, Annual review of biochemistry.

[14]  G. Wiederschain,et al.  Essentials of glycobiology , 2009, Biochemistry (Moscow).

[15]  E. Toone Structure and energetics of protein-carbohydrate complexes , 1994 .

[16]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[17]  Wen-Lian Hsu,et al.  Rationalization and Design of the Complementarity Determining Region Sequences in an Antibody-Antigen Recognition Interface , 2012, PloS one.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[20]  Hassan Al-Ali,et al.  Prediction of protein‐glucose binding sites using support vector machines , 2009, Proteins.

[21]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[22]  J M Thornton,et al.  Analysis and prediction of carbohydrate binding sites. , 2000, Protein engineering.