Hidden Markov Model Classification Based on Empirical Frequencies of Observed Symbols

Abstract Given a sequence of observations, classification among two known hidden Markov models (HMMs) can be accomplished with a classifier that minimizes the probability of error (i.e., the probability of misclassification) by enforcing the maximum a posteriori probability (MAP) rule. For this MAP classifier, the a priori probability of error (before any observations are made) can be obtained, as a function of the length of the sequence of observations, by summing up the probability of error over all possible observation sequences of the given length, which is a computationally expensive task. In this paper, we obtain an upper bound on the probability of error of the MAP classifier. Our results are based on a suboptimal decision rule that ignores the order with which observations occur and relies solely on the empirical frequencies with which different symbols appear. We describe necessary and sufficient conditions under which this bound on the probability of error decreases exponentially with the length of the observation sequence. Apart from the usefulness of the suboptimal rule in bounding the probability of misclassification, its numerous advantages (such as low computational complexity, reduced storage requirements, and potential applicability to distributed or decentralized decision schemes) could prove a useful alternative to the MAP rule for HMM classification in many applications.

[1]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[2]  Valeria De Fonzo,et al.  Hidden Markov Models in Bioinformatics , 2007 .

[3]  Leonid Kontorovich Measure Concentration of Hidden Markov Processes , 2006 .

[4]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[6]  P. Glynn,et al.  Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .

[7]  Jun Chen,et al.  Polynomial Test for Stochastic Diagnosability of Discrete-Event Systems , 2013, IEEE Transactions on Automation Science and Engineering.

[8]  Christoforos Keroglou,et al.  Bounds on the probability of misclassification among hidden Markov models , 2011, IEEE Conference on Decision and Control and European Control Conference.

[9]  Markus Falkhausen,et al.  Calculation of distance measures between hidden Markov models , 1995, EUROSPEECH.

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[12]  Joe Brewer,et al.  Kronecker products and matrix calculus in system theory , 1978 .

[13]  Eleftheria Athanasopoulou,et al.  Probability of error bounds for failure diagnosis and classification in hidden Markov models , 2008, 2008 47th IEEE Conference on Decision and Control.

[14]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[15]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[16]  Christoforos N. Hadjicostis Probabilistic detection of FSM single state-transition faults based on state occupancy measurements , 2005, IEEE Transactions on Automatic Control.

[17]  Demosthenis Teneketzis,et al.  Diagnosability of stochastic discrete-event systems , 2005, IEEE Transactions on Automatic Control.

[18]  Jan Lunze,et al.  State Observation and Diagnosis of Discrete-Event Systems Described by Stochastic Automata , 2001, Discret. Event Dyn. Syst..

[19]  C. Hadjicostis,et al.  Bound on the probability of HMM misclassification , 2011, 2011 19th Mediterranean Conference on Control & Automation (MED).

[20]  W.-G. Tseng The equivalence and learning of probabilistic automata , 1989, 30th Annual Symposium on Foundations of Computer Science.

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.