BackgroundGlycobiology pertains to the study of carbohydrate sugar chains, or glycans, in a particular cell or organism. Many computational approaches have been proposed for analyzing these complex glycan structures, which are chains of monosaccharides. The monosaccharides are linked to one another by glycosidic bonds, which can take on a variety of comformations, thus forming branches and resulting in complex tree structures. The q-gram method is one of these recent methods used to understand glycan function based on the classification of their tree structures. This q-gram method assumes that for a certain q, different q-grams share no similarity among themselves. That is, that if two structures have completely different components, then they are completely different. However, from a biological standpoint, this is not the case. In this paper, we propose a weighted q-gram method to measure the similarity among glycans by incorporating the similarity of the geometric structures, monosaccharides and glycosidic bonds among q-grams. In contrast to the traditional q-gram method, our weighted q-gram method admits similarity among q-grams for a certain q. Thus our new kernels for glycan structure were developed and then applied in SVMs to classify glycans.ResultsTwo glycan datasets were used to compare the weighted q-gram method and the original q-gram method. The results show that the incorporation of q-gram similarity improves the classification performance for all of the important glycan classes tested.ConclusionThe results in this paper indicate that similarity among q-grams obtained from geometric structure, monosaccharides and glycosidic linkage contributes to the glycan function classification. This is a big step towards the understanding of glycan function based on their complex structures.
[1]
Michael I. Jordan,et al.
Computing regularization paths for learning multiple kernels
,
2004,
NIPS.
[2]
Yoshihiro Yamanishi,et al.
Glycan classification with tree kernels
,
2007,
Bioinform..
[3]
M. Kanehisa,et al.
Heuristics for chemical compound matching.
,
2003,
Genome informatics. International Conference on Genome Informatics.
[4]
M. Kanehisa,et al.
Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.
,
2003,
Journal of the American Chemical Society.
[5]
Tatsuya Akutsu,et al.
Efficient tree-matching methods for accurate carbohydrate database queries.
,
2003,
Genome informatics. International Conference on Genome Informatics.
[6]
Yoshihiro Yamanishi,et al.
Extraction of species-specific glycan substructures.
,
2004,
Genome informatics. International Conference on Genome Informatics.
[7]
Kiyoko F. Aoki-Kinoshita,et al.
KEGG as a glycome informatics resource.
,
2006,
Glycobiology.
[8]
R. Dwek,et al.
Glycobiology
,
2018,
Biochimie.
[9]
Yoshihiro Yamanishi,et al.
Extraction of leukemia specific glycan motifs in humans by computational glycomics.
,
2005,
Carbohydrate research.
[10]
Tatsuya Akutsu,et al.
A score matrix to reveal the hidden links in glycans
,
2005,
Bioinform..
[11]
Carolyn R. Bertozzi,et al.
Essentials of Glycobiology
,
1999
.
[12]
P. Albersheim,et al.
Letter to the Glyco-Forum CarbBank
,
1992
.
[13]
Hiroshi Yasuda,et al.
A gram distribution kernel applied to glycan classification and motif extraction.
,
2006,
Genome informatics. International Conference on Genome Informatics.