论文信息 - cswHMM: A Novel Context Switching Hidden Markov Model for Biological Sequence Analysis

cswHMM: A Novel Context Switching Hidden Markov Model for Biological Sequence Analysis

In this work we created a sequence model that goes beyond simple linear patterns to model a specific type of higher-order relationship possible in biological sequences. Particularly, we seek models that can account for partially overlaid and interleaved patterns in biological sequences. Our proposed context-switching model (cswHMM) is designed as a variable-order hidden Markov model (HMM) with a specific structure that allows switching control between two or more sub-models.Tests of this approach suggest that a combination of HMMs for protein sequence analysis, such as pattern mining based HMMs or profile HMMs, with the context-switching approach can improve the descriptive ability and performance of the models.

Matej Lexa | Vojtech Bystrý

[1] A. Krogh,et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[2] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[3] E. Birney,et al. Pfam: the protein families database , 2013, Nucleic Acids Res..

[4] Mark P. Styczynski,et al. A generic motif discovery algorithm for sequential data. , 2006, Bioinformatics.

[5] A. Elofsson,et al. Best α‐helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information , 2004 .

[6] Judith Klein-Seetharaman,et al. Computational Biology and Language , 2004, Ambient Intelligence for Scientific Discovery.

[7] Christopher D. Carothers,et al. VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining , 2010, TKDD.

[8] John Riedl,et al. Generalized suffix trees for biological sequence data: applications and implementation , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[9] L. Holm,et al. The Pfam protein families database , 2005, Nucleic Acids Res..

[10] Golan Yona,et al. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families , 2001, Bioinform..

[11] Mikael Bodén,et al. MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[12] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[13] Ming Zhang,et al. A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes , 2006, BMC Bioinformatics.

[14] Amos Bairoch,et al. Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[15] Simon Cawley,et al. Applications of generalized pair hidden Markov models to alignment and gene finding problems , 2001, J. Comput. Biol..

[16] Sean R. Eddy,et al. Profile hidden Markov models , 1998, Bioinform..

[17] Richard Hughey,et al. Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[18] András Fiser,et al. Structural Characteristics of Novel Protein Folds , 2010, PLoS Comput. Biol..

[19] Mohammed J. Zaki,et al. SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.