Finite state space non parametric Hidden Markov Models are in general identifiable

In this paper, we prove that finite state space non parametric hidden Markov models are identifiable as soon as the transition matrix of the latent Markov chain has full rank and the emission probability distributions are linearly independent. We then propose several non parametric likelihood based estimation methods, which we apply to models used in applications. We finally show on examples that the use of non parametric modeling and estimation may improve the classification performances.

[1]  Tao Jiang,et al.  The Regularized EM Algorithm , 2005, AAAI.

[2]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[3]  Stéphane Robin,et al.  Hidden Markov Models with mixtures as emission distributions , 2012, Statistics and Computing.

[4]  Gesine Reinert,et al.  The Power of Detecting Enriched Patterns: An HMM Approach , 2010, J. Comput. Biol..

[5]  Laurent Couvreur,et al.  Wavelet-based non-parametric HMM's: theory and applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  S. Geer Applications of empirical process theory , 2000 .

[7]  Dipankar Bandyopadhyay,et al.  Hidden Markov models for zero‐inflated Poisson counts with an application to substance use , 2011, Statistics in medicine.

[8]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[9]  Fabrice Lefèvre,et al.  Non-parametric probability estimation for HMM-based automatic speech recognition , 2003, Comput. Speech Lang..

[10]  Martin F. Lambert,et al.  A non-parametric hidden Markov model for climate state identification , 2003 .

[11]  Mark Gerstein,et al.  Bioinformatics Original Paper a Supervised Hidden Markov Model Framework for Efficiently Segmenting Tiling Array Data in Transcriptional and Chip-chip Experiments: Systematically Incorporating Validated Biological Knowledge , 2022 .

[12]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[13]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .

[14]  David R. Hunter,et al.  An EM-Like Algorithm for Semi- and Nonparametric Estimation in Multivariate Mixtures , 2009 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Stéphane Robin,et al.  Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome , 2011, Statistical applications in genetics and molecular biology.

[17]  Xiao-Hua Zhou,et al.  NONPARAMETRIC ESTIMATION OF COMPONENT DISTRIBUTIONS IN A MULTIVARIATE MIXTURE , 2003 .

[18]  J. Rousseau,et al.  Non parametric finite translation mixtures with dependent regime , 2013, 1302.2345.

[19]  Cathy Maugis,et al.  A non asymptotic penalized criterion for Gaussian mixture model selection , 2011 .

[20]  Stéphane Robin,et al.  Least-squares estimation of a convex discrete distribution , 2013, Comput. Stat. Data Anal..

[21]  Madalina Olteanu,et al.  Hidden Markov models for time series of counts with excess zeros , 2012, ESANN.

[22]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[23]  D. Hunter,et al.  Maximum smoothed likelihood for multivariate mixtures , 2011 .

[24]  Lifeng Shang,et al.  Nonparametric discriminant HMM and application to facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Farzin Mokhtarian,et al.  A Non-Parametric HMM Learning Method for Shape Dynamics with Application to Human Motion Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[27]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.