Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling.

Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.

[1]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[2]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[3]  D. Blackwell,et al.  On the Identifiability Problem for Functions of Finite Markov Chains , 1957 .

[4]  P. Billingsley,et al.  Statistical Methods in Markov Chains , 1961 .

[5]  G. P. Bhattacharjee,et al.  Algorithm AS 63: The Incomplete Beta Integral , 1973 .

[6]  L. Goddard Information Theory , 1962, Nature.

[7]  G. P. Bhattacharjee,et al.  Inverse of the Incomplete Beta Function Ratio , 1973 .

[8]  R. Katz On Some Criteria for Estimating the Order of a Markov Chain , 1981 .

[9]  J. Crutchfield,et al.  Statistical complexity of simple one-dimensional spin systems , 1997, cond-mat/9702191.

[10]  Gregory K. Schenter,et al.  Statistical Analyses and Theoretical Models of Single-Molecule Enzymatic Dynamics , 1999 .

[11]  D. Lilley,et al.  Structural dynamics of individual Holliday junctions , 2003, Nature Structural Biology.

[12]  Inés Samengo Estimating probabilities from experimental frequencies. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[14]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[15]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[16]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[17]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[18]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[19]  G. E. Thomas,et al.  Remark AS R19 and Algorithm AS 109: A Remark on Algorithms: AS 63: The Incomplete Beta Integral AS 64: Inverse of the Incomplete Beta Function Ratio , 1977 .

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  N. Packard,et al.  Symbolic dynamics of noisy chaos , 1983 .

[22]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[23]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[24]  Chris Chatfield,et al.  Statistical Inference Regarding Markov Chain Models , 1973 .

[25]  D. Steinberg,et al.  Technometrics , 2008 .

[26]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[27]  B. Hao,et al.  Applied Symbolic Dynamics and Chaos , 1998 .

[28]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[29]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[30]  J. Crutchfield The calculi of emergence: computation, dynamics and induction , 1994 .

[31]  P. Mielke,et al.  A remark on algorithm AS 109 : inverse of the incomplete beta function ratio , 1990 .

[32]  A. d’Onofrio Fractal growth of tumors and other cellular populations: Linking the mechanistic to the phenomenological modeling and vice versa , 2009, 1309.3329.

[33]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[34]  H. Tong Determination of the order of a Markov chain by Akaike's information criterion , 1975, Journal of Applied Probability.

[35]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[36]  Andrew G. Glen,et al.  APPL , 2001 .

[37]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[38]  Taekjip Ha,et al.  Observing spontaneous branch migration of Holliday junctions one step at a time. , 2005, Proceedings of the National Academy of Sciences of the United States of America.