Optimal error exponents in hidden Markov models order estimation

We consider the estimation of the number of hidden states (the order) of a discrete-time finite-alphabet hidden Markov model (HMM). The estimators we investigate are related to code-based order estimators: penalized maximum-likelihood (ML) estimators and penalized versions of the mixture estimator introduced by Liu and Narayan (1994). We prove strong consistency of those estimators without assuming any a priori upper bound on the order and smaller penalties than previous works. We prove a version of Stein's lemma for HMM order estimation and derive an upper bound on underestimation exponents. Then we prove that this upper bound can be achieved by the penalized ML estimator and by the penalized mixture estimator. The proof of the latter result gets around the elusive nature of the ML in HMM by resorting to large-deviation techniques for empirical processes. Finally, we prove that for any consistent HMM order estimator, for most HMM, the overestimation exponent is null.

[1]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[2]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[3]  Ofer Zeitouni,et al.  On universal hypotheses testing via large deviations , 1991, IEEE Trans. Inf. Theory.

[4]  I. Csiszár,et al.  The consistency of the BIC Markov order estimator , 2000 .

[5]  Neri Merhav,et al.  Universal composite hypothesis testing: A competitive minimax approach , 2002, IEEE Trans. Inf. Theory.

[6]  Sanjeev Khudanpur,et al.  Order estimation for a special class of hidden Markov sources and binary renewal processes , 2002, IEEE Trans. Inf. Theory.

[7]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[8]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[9]  B. Leroux Maximum-likelihood estimation for hidden Markov models , 1992 .

[10]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[11]  Shun-ichi Amari,et al.  Identifiability of hidden Markov information sources and their minimum degrees of freedom , 1992, IEEE Trans. Inf. Theory.

[12]  R. Douc,et al.  Asymptotics of the maximum likelihood estimator for general hidden Markov models , 2001 .

[13]  Neri Merhav,et al.  Estimating the number of states of a finite-state source , 1992, IEEE Trans. Inf. Theory.

[14]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[15]  Chuang-Chun Liu,et al.  The optimal error exponent for Markov order estimation , 1996, IEEE Trans. Inf. Theory.

[16]  Edmund Taylor Whittaker,et al.  A Course of Modern Analysis , 2021 .

[17]  L. Finesso,et al.  The optimal Error Exponent for Markov Order Estimation , 1993, Proceedings. IEEE International Symposium on Information Theory.

[18]  Neri Merhav,et al.  When is the generalized likelihood ratio test optimal? , 1992, IEEE Trans. Inf. Theory.

[19]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[20]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[21]  Prakash Narayan,et al.  Order estimation and sequential universal data compression of a hidden Markov source by the method of mixtures , 1994, IEEE Trans. Inf. Theory.

[22]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[23]  R. Douc,et al.  Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime , 2004, math/0503681.

[24]  E. Gassiat Likelihood ratio inequalities with applications to various mixtures , 2002 .

[25]  E. Gassiat,et al.  The likelihood ratio test for the number of components in a mixture with Markov regime , 2000 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  P. Bickel,et al.  Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models , 1998 .

[28]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[29]  James D. Hamilton State-space models , 1994 .

[30]  Neri Merhav,et al.  A competitive Neyman-Pearson approach to universal hypothesis testing with applications , 2002, IEEE Trans. Inf. Theory.

[31]  John C. Kieffer,et al.  Strongly consistent code-based identification and order estimation for constrained finite-state model classes , 1993, IEEE Trans. Inf. Theory.

[32]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[34]  Laurent Mevel,et al.  Exponential Forgetting and Geometric Ergodicity in Hidden Markov Models , 2000, Math. Control. Signals Syst..

[35]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[36]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[37]  Ofer Zeitouni,et al.  Correction to 'On Universal Hypotheses Testing Via Large Deviations' , 1991, IEEE Trans. Inf. Theory.

[38]  T Petrie,et al.  Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[40]  Rachel J. Mackay,et al.  Estimating the order of a hidden markov model , 2002 .

[41]  J. Lynch,et al.  A weak convergence approach to the theory of large deviations , 1997 .

[42]  O. F. Cook The Method of Types , 1898 .

[43]  Michael B. Pursley,et al.  Efficient universal noiseless source codes , 1981, IEEE Trans. Inf. Theory.

[44]  Jens Ledet Jensen,et al.  Asymptotic normality of the maximum likelihood estimator in state space models , 1999 .

[45]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[46]  Imre Csiszár Large-scale typicality of Markov sample paths and consistency of MDL Order estimators , 2002, IEEE Trans. Inf. Theory.

[47]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[48]  Imre Csisźar,et al.  The Method of Types , 1998, IEEE Trans. Inf. Theory.