Managing and analyzing carbohydrate data

One of the most vital molecules in multicellular organisms is the carbohydrate, as it is structurally important in the construction of such organisms. In fact, all cells in nature carry carbohydrate sugar chains, or glycans, that help modulate various cell-cell events for the development of the organism. Unfortunately, informatics research on glycans has been slow in comparison to DNA and proteins, largely due to difficulties in the biological analysis of glycan structures. Our work consists of data engineering approaches in order to glean some understanding of the current glycan data that is publicly available. In particular, by modeling glycans as labeled unordered trees, we have implemented a tree-matching algorithm for measuring tree similarity. Our algorithm utilizes proven efficient methodologies in computer science that has been extended and developed for glycan data. Moreover, since glycans are recognized by various agents in multicellular organisms, in order to capture the patterns that might be recognized, we needed to somehow capture the dependencies that seem to range beyond the directly connected nodes in a tree. Therefore, by defining glycans as labeled ordered trees, we were able to develop a new probabilistic tree model such that sibling patterns across a tree could be mined. We provide promising results from our methodologies that could prove useful for the future of glycome informatics.

[1]  Richard G. Baraniuk,et al.  Multiscale image segmentation using wavelet-domain hidden Markov models , 2001, IEEE Trans. Image Process..

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Tatsuya Akutsu,et al.  Application of a new probabilistic model for recognizing complex patterns in glycans , 2004, ISMB/ECCB.

[4]  Tatsuya Akutsu,et al.  Efficient tree-matching methods for accurate carbohydrate database queries. , 2003, Genome informatics. International Conference on Genome Informatics.

[5]  G. Larson,et al.  Bacteria of the human intestinal microbiota produce glycosidases specific for lacto-series glycosphingolipids. , 1990, Journal of biochemistry.

[6]  K. Drickamer,et al.  Two distinct classes of carbohydrate-recognition domains in animal lectins. , 1988, The Journal of biological chemistry.

[7]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[8]  A. Varki,et al.  Sialic acids as ligands in recognition phenomena , 1997, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[9]  Carolyn R. Bertozzi,et al.  Chemical Glycobiology , 2001, Science.

[10]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[11]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[12]  Kiyoko F. Aoki-Kinoshita,et al.  A General Probabilistic Framework for Mining Labeled Ordered Trees , 2004, SDM.

[13]  I Marchal,et al.  Bioinformatics in glycobiology. , 2003, Biochimie.

[14]  Udo Schumacher,et al.  Functional and Molecular Glycobiology , 2002 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Paolo Frasconi,et al.  Hidden Tree Markov Models for Document Image Classification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[18]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[19]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[20]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[21]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[22]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[23]  Hiroki Arimura,et al.  Online algorithms for mining semi-structured data stream , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..