Glycan classification with tree kernels

MOTIVATION Glycans are covalent assemblies of sugar that play crucial roles in many cellular processes. Recently, comprehensive data about the structure and function of glycans have been accumulated, therefore the need for methods and algorithms to analyze these data is growing fast. RESULTS This article presents novel methods for classifying glycans and detecting discriminative glycan motifs with support vector machines (SVM). We propose a new class of tree kernels to measure the similarity between glycans. These kernels are based on the comparison of tree substructures, and take into account several glycan features such as the sugar type, the sugar bound type or layer depth. The proposed methods are tested on their ability to classify human glycans into four blood components: leukemia cells, erythrocytes, plasma and serum. They are shown to outperform a previously published method. We also applied a feature selection approach to extract glycan motifs which are characteristic of each blood component. We confirmed that some leukemia-specific glycan motifs detected by our method corresponded to several results in the literature. AVAILABILITY Softwares are available upon request. SUPPLEMENTARY INFORMATION Datasets are available at the following website: http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/glycankernel/

[1]  Kiyoko F. Aoki-Kinoshita,et al.  KEGG as a glycome informatics resource. , 2006, Glycobiology.

[2]  S. Hakomori,et al.  Quantitative and qualitative characterization of human cancer-associated serum glycoprotein antigens expressing fucosyl or sialyl-fucosyl type 2 chain polylactosamine. , 1986, Cancer research.

[3]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[4]  Tatsuya Akutsu,et al.  A probabilistic model for mining labeled ordered trees: capturing patterns in carbohydrate sugar chains , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  J. Esko,et al.  The sweet and sour of cancer: glycans as novel therapeutic targets , 2005, Nature Reviews Cancer.

[6]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[7]  G. Wiederschain,et al.  Essentials of glycobiology , 2009, Biochemistry (Moscow).

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[10]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[14]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[15]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[18]  Tatsuya Akutsu,et al.  A score matrix to reveal the hidden links in glycans , 2005, Bioinform..

[19]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[20]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[21]  Haixu Tang,et al.  Automated interpretation of MS/MS spectra of oligosaccharides , 2005, ISMB.

[22]  Yoshihiro Yamanishi,et al.  Extraction of leukemia specific glycan motifs in humans by computational glycomics. , 2005, Carbohydrate research.