Application of a new probabilistic model for recognizing complex patterns in glycans

MOTIVATION The study of carbohydrate sugar chains, or glycans, has been one of slow progress mainly due to the difficulty in establishing standard methods for analyzing their structures and biosynthesis. Glycans are generally tree structures that are more complex than linear DNA or protein sequences, and evidence shows that patterns in glycans may be present that spread across siblings and into further regions that are not limited by the edges in the actual tree structure itself. Current models were not able to capture such patterns. RESULTS We have applied a new probabilistic model, called probabilistic sibling-dependent tree Markov model (PSTMM), which is able to inherently capture such complex patterns of glycans. Not only is the ability to capture such patterns important in itself, but this also implies that PSTMM is capable of performing multiple tree structure alignments efficiently. We prove through experimentation on actual glycan data that this new model is extremely useful for gaining insight into the hidden, complex patterns of glycans, which are so crucial for the development and functioning of higher level organisms. Furthermore, we also show that this model can be additionally utilized as an innovative approach to multiple tree alignment, which has not been applied to glycan chains before. This extension on the usage of PSTMM may be a major step forward for not only the structural analysis of glycans, but it may consequently prove useful for discovering clues into their function.

[1]  K. Drickamer,et al.  Two distinct classes of carbohydrate-recognition domains in animal lectins. , 1988, The Journal of biological chemistry.

[2]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[3]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[4]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[5]  A. Varki,et al.  Sialic acids as ligands in recognition phenomena , 1997, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[6]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[7]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[8]  Sean R. Eddy,et al.  Biological sequence analysis: RNA structure analysis , 1998 .

[9]  Kimmen Sjölander,et al.  Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains , 1998, ISMB.

[10]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[11]  Carolyn R. Bertozzi,et al.  Essentials of Glycobiology , 1999 .

[12]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[13]  Carolyn R. Bertozzi,et al.  Chemical Glycobiology , 2001, Science.

[14]  C. Bertozzi,et al.  Chemical Glycobiology , 2001, Science.

[15]  Miklós Csürös Fast recovery of evolutionary trees with thousands of nodes , 2001, RECOMB.

[16]  Andrzej Lingas,et al.  A Fast Algorithm for Optimal Alignment between Similar Ordered Trees , 2001, CPM.

[17]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[18]  Paolo Frasconi,et al.  Hidden Tree Markov Models for Document Image Classification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[20]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[21]  Tatsuya Akutsu,et al.  Efficient tree-matching methods for accurate carbohydrate database queries. , 2003, Genome informatics. International Conference on Genome Informatics.

[22]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[23]  Kiyoko F. Aoki-Kinoshita,et al.  A General Probabilistic Framework for Mining Labeled Ordered Trees , 2004, SDM.

[24]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.