Carbohydrate‐binding protein identification by coupling structural similarity searching with binding affinity prediction

Carbohydrate‐binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure‐based function‐prediction technique called SPOT‐Struc that identifies carbohydrate‐recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge‐based statistical potential based on the distance‐scaled finite‐ideal gas reference state (DFIRE). The leave‐one‐out cross‐validation of the method on 113 carbohydrate‐binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT‐Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT‐Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT‐Struc will not change significantly if highly homologous templates were used. SPOT‐Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin‐9 from Mus musculus. Thus, SPOT‐Struc is useful for uncovering novel carbohydrate‐binding proteins. SPOT‐Struc is available at http://sparks‐lab.org. © 2014 Wiley Periodicals, Inc.

[1]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[2]  Jesús Jiménez-Barbero,et al.  From lectin structure to functional glycomics: principles of the sugar code. , 2011, Trends in biochemical sciences.

[3]  B. Henrissat,et al.  The Crystal Structure of the Family 6 Carbohydrate Binding Module from Cellvibrio mixtus Endoglucanase 5A in Complex with Oligosaccharides Reveals Two Distinct Binding Sites with Different Ligand Specificities* , 2004, Journal of Biological Chemistry.

[4]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[5]  J. Balzarini,et al.  Potential of carbohydrate‐binding agents as therapeutics against enveloped viruses , 2010, Medicinal research reviews.

[6]  Yaoqi Zhou,et al.  A new size‐independent score for pairwise protein structure alignment and its application to structure classification and nucleic‐acid binding prediction , 2012, Proteins.

[7]  Hassan Al-Ali,et al.  Prediction of protein‐glucose binding sites using support vector machines , 2009, Proteins.

[8]  S. Nakahara,et al.  Biological modulation by lectins and their ligands in tumor progression and metastasis. , 2008, Anti-cancer agents in medicinal chemistry.

[9]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[10]  I. Connerton,et al.  Carbohydrate binding and gene expression by in vitro and in vivo propagated Campylobacter jejuni after Immunomagnetic Separation , 2013, Journal of basic microbiology.

[11]  D. Bolam,et al.  Carbohydrate-binding modules: fine-tuning polysaccharide recognition. , 2004, The Biochemical journal.

[12]  Soichi Wakatsuki,et al.  Crystal Structure of the Galectin-9 N-terminal Carbohydrate Recognition Domain from Mus musculus Reveals the Basic Mechanism of Carbohydrate Recognition* , 2006, Journal of Biological Chemistry.

[13]  J. Gordon,et al.  Starch catabolism by a prominent human gut symbiont is directed by the recognition of amylose helices. , 2008, Structure.

[14]  Song Liu,et al.  A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. , 2005, Journal of medicinal chemistry.

[15]  Yaoqi Zhou,et al.  Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function , 2010, Bioinform..

[16]  Shandar Ahmad,et al.  PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools , 2010, Adv. Bioinformatics.

[17]  R. Dwek,et al.  Glycobiology , 2018, Biochimie.

[18]  T. Martin,et al.  Anti-Cancer agents in medicinal chemistry (Formerly current medicinal chemistry - Anti-cancer agents). , 2010, Anti-cancer agents in medicinal chemistry.

[19]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[20]  Yaoqi Zhou,et al.  Specific interactions for ab initio folding of protein terminal regions with secondary structures , 2008, Proteins.

[21]  J. Hirabayashi,et al.  Glycoconjugate microarray based on an evanescent-field fluorescence-assisted detection principle for investigation of glycan-binding proteins. , 2008, Glycobiology.

[22]  James C Paulson,et al.  Glycan microarrays for decoding the glycome. , 2011, Annual review of biochemistry.

[23]  Wen-Lian Hsu,et al.  Prediction of Carbohydrate Binding Sites on Protein Surfaces with 3-Dimensional Probability Density Distributions of Interacting Atoms , 2012, PloS one.

[24]  M. Cygler,et al.  Recognition of a carbohydrate antigenic determinant of Salmonella by an antibody. , 1993, Biochemical Society transactions.

[25]  E. F. ARMSTRONG,et al.  Annual Review of Biochemistry , 1944, Nature.

[26]  E. Hall,et al.  The nature of biotechnology. , 1988, Journal of biomedical engineering.

[27]  M. Higgins,et al.  Carbohydrate binding molecules in malaria pathology. , 2010, Current opinion in structural biology.

[28]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[29]  J M Thornton,et al.  Analysis and prediction of carbohydrate binding sites. , 2000, Protein engineering.

[30]  Mahesh Kulharia,et al.  InCa-SiteFinder: a method for structure-based prediction of inositol and carbohydrate binding sites on proteins. , 2009, Journal of molecular graphics & modelling.

[31]  ダニエル エル. フリン,et al.  Anti-cancer agents , 2003 .

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[34]  Ten Feizi,et al.  Oligosaccharide microarrays for high-throughput detection and specificity assignments of carbohydrate-protein interactions , 2002, Nature Biotechnology.

[35]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[36]  Takashi Yamane,et al.  An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins. , 2003, Protein engineering.

[37]  J. Lowe,et al.  Role of glycosylation in development. , 2003, Annual review of biochemistry.

[38]  M. Tanner Trends in Biochemical Sciences , 1982 .

[39]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[40]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[41]  Yaoqi Zhou,et al.  DDOMAIN: Dividing structures into domains using a normalized domain–domain interaction profile , 2007, Protein science : a publication of the Protein Society.

[42]  S. Hakomori Tumor malignancy defined by aberrant glycosylation and sphingo(glyco)lipid metabolism. , 1996, Cancer research.

[43]  J. Tiralongo,et al.  Differential carbohydrate binding and cell surface glycosylation of human cancer cell lines , 2011, Journal of cellular biochemistry.