An Analysis of Continuous Time Markov Chains using Generator Matrices

This paper mainly analyzes the applications of the Generator matrices in a Continuous Time Markov Chain (CTMC). Hidden Markov models [HMMs] together with related probabilistic models such as Stochastic Context-Free Grammars [SCFGs] are the basis of many algorithms for the analysis of biological sequences. Combined with the continuous-time Markov chain theory of likelihood based phylogeny, stochastic grammar approaches are finding broad application in comparative sequence analysis, in particular the annotation of multiple alignments, simultaneous alignment. It was originally used to annotate individual sequences, then in later stages stochastic grammars were soon also combined with phylogenetic models to annotate the alignments. Thus, trees have been combined with HMMs to predict genes and conserved regions in DNA sequences, secondary structures and transmembrane topologies in protein sequences and base pairing structures in RNA sequences. The importance of Generator matrix is analysed in deriving the various properties of continuous time Markov chins with examples from the phylogenetic tree.

[1]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[2]  A. Albert Estimating the Infinitesimal Generator of a Continuous Time, Finite State Markov Process , 1962 .

[3]  R A Goldstein,et al.  Context-dependent optimal substitution matrices. , 1995, Protein engineering.

[4]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[5]  W. Bruno Modeling residue usage in aligned protein sequences via maximum likelihood. , 1996, Molecular biology and evolution.

[6]  Jens Timmer,et al.  Estimating rate constants in hidden Markov models by the EM algorithm , 1999, IEEE Trans. Signal Process..

[7]  I Holmes,et al.  An expectation maximization algorithm for training hidden substitution models. , 2002, Journal of molecular biology.

[8]  D. Vere-Jones Markov Chains , 1972, Nature.

[9]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[10]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[11]  Yasunari Inamura Estimating Continuous Time Transition Matrices From Discretely Observed Data , 2006 .

[12]  Simon Tavaré,et al.  A Model for Phylogenetic Inference Using Structural and Chemical Covariates , 2000, Pacific Symposium on Biocomputing.

[13]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[14]  Jinzhi Lei,et al.  Stochastic Modeling in Systems Biology , 2011, 1104.4524.

[15]  S. Jeffery Evolution of Protein Molecules , 1979 .

[16]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[17]  Patricia Buendia,et al.  A phylogenetic and Markov model approach for the reconstruction of mutational pathways of drug resistance , 2009, Bioinform..

[18]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .