Feature-incorporated alignment based ligand-binding residue prediction for carbohydrate-binding modules

MOTIVATION Carbohydrate-binding modules (CBMs) share similar secondary and tertiary topology, but their primary sequence identity is low. Computational identification of ligand-binding residues allows biologists to better understand the protein-carbohydrate binding mechanism. In general, functional characterization can be alternatively solved by alignment-based manners. As alignment accuracy based on conventional methods is often sensitive to sequence identity, low sequence identity among query sequences makes it difficult to precisely locate small portions of relevant features. Therefore, we propose a feature-incorporated alignment (FIA) to flexibly align conserved signatures in CBMs. Then, an FIA-based target-template prediction model was further implemented to identify functional ligand-binding residues. RESULTS Arabidopsis thaliana CBM45 and CBM53 were used to validate the FIA-based prediction model. The predicted ligand-binding residues residing on the surface in the hypothetical structures were verified to be ligand-binding residues. In the absence of 3D structural information, FIA demonstrated significant improvement in the estimation of sequence similarity and identity for a total of 808 sequences from 11 different CBM families as compared with six leading tools by Friedman rank test.

[1]  R. Kuroki,et al.  Crystal structure of glycosyltrehalose trehalohydrolase from the hyperthermophilic archaeum Sulfolobus solfataricus. , 2000, Journal of molecular biology.

[2]  P. Simpson,et al.  Trp22, Trp24, and Tyr8 play a pivotal role in the binding of the family 10 cellulose-binding module from Pseudomonas xylanase A to insoluble ligands. , 2000, Biochemistry.

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  M. Botchan,et al.  Structure of the papillomavirus DNA-tethering complex E2:Brd4 and a peptide that ablates HPV chromosomal association. , 2006, Molecular cell.

[5]  Ping-Chiang Lyu,et al.  Solution structure of family 21 carbohydrate-binding module from Rhizopus oryzae glucoamylase. , 2007, The Biochemical journal.

[6]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[7]  K. Okuyama,et al.  Crystal structure of Thermoactinomyces vulgaris R-47 alpha-amylase II (TVAII) hydrolyzing cyclodextrins and pullulan at 2.6 A resolution. , 1999, Journal of molecular biology.

[8]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[9]  Paul J. Dauenhauer,et al.  Chemical engineering: Hybrid routes to biofuels , 2007, Nature.

[10]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[11]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[12]  S. Kamitori,et al.  Complex structures of Thermoactinomyces vulgaris R-47 alpha-amylase 1 with malto-oligosaccharides demonstrate the role of domain N acting as a starch-binding domain. , 2004, Journal of molecular biology.

[13]  René Thomsen,et al.  MolDock: a new technique for high-accuracy molecular docking. , 2006, Journal of medicinal chemistry.

[14]  G Williamson,et al.  Solution structure of the granular starch binding domain of Aspergillus niger glucoamylase bound to beta-cyclodextrin. , 1997, Structure.

[15]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[16]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[17]  Jinn-Moon Yang,et al.  GEMDOCK: A generic evolutionary method for molecular docking , 2004, Proteins.

[18]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[19]  Chuan Yi Tang,et al.  Biological Feature Incorporated Alignment for Cross Species Analysis on Carbohydrate Binding Modules , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[20]  J H Lakey,et al.  Role of hydrogen bonding in the interaction between a xylan binding module and xylan. , 2001, Biochemistry.

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Gustavo Parisi,et al.  Role of the N-terminal starch-binding domains in the kinetic properties of starch synthase III from Arabidopsis thaliana. , 2008, Biochemistry.

[23]  Christine Slingsby,et al.  Crystal structure and assembly of a eukaryotic small heat shock protein , 2001, Nature Structural Biology.

[24]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[25]  Gary Williamson,et al.  The starch‐binding domain from glucoamylase disrupts the structure of starch , 1999, FEBS letters.

[26]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[27]  Yin-Te Tsai,et al.  Constrained multiple sequence alignment tool development and its application to RNase family alignment , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[28]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[29]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[30]  Tun-Wen Pai,et al.  Multiple Indexing Sequence Alignment for Group Feature Identification , 2008, RECOMB 2008.

[31]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[32]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[33]  D. Bolam,et al.  Carbohydrate-binding modules: fine-tuning polysaccharide recognition. , 2004, The Biochemical journal.

[34]  T. Steitz,et al.  Crystal structure of a bacterial family‐III cellulose‐binding domain: a general mechanism for attachment to cellulose. , 1996, The EMBO journal.

[35]  Nahuel Z. Wayllace,et al.  The starch‐binding capacity of the noncatalytic SBD2 region and the interaction between the N‐ and C‐terminal domains are involved in the modulation of the activity of starch synthase III from Arabidopsis thaliana , 2010, The FEBS journal.

[36]  C. Biliaderis,et al.  Amylolytic enzymes and products derived from starch: a review. , 1995, Critical reviews in food science and nutrition.

[37]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[38]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[39]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[40]  D. Bolam,et al.  Importance of hydrophobic and polar residues in ligand binding in the family 15 carbohydrate-binding module from Cellvibrio japonicus Xyn10C. , 2003, Biochemistry.

[41]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[42]  H. Hashimoto,et al.  Recent structural studies of carbohydrate-binding modules , 2006, Cellular and Molecular Life Sciences CMLS.

[43]  Yuh-Ju Sun,et al.  Crystal structures of the starch-binding domain from Rhizopus oryzae glucoamylase reveal a polysaccharide-binding path. , 2008, The Biochemical journal.

[44]  Birte Svensson,et al.  The carbohydrate‐binding module family 20 – diversity, structure, and function , 2009, The FEBS journal.

[45]  A. Blennow,et al.  A novel type carbohydrate-binding module identified in alpha-glucan, water dikinases is specific for regulated plastidial starch metabolism. , 2006, Biochemistry.

[46]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[47]  References , 1971 .