Variational Bhattacharyya divergence for hidden Markov models

Many applications require the use of divergence measures between probability distributions. Several of these, such as the Kullback-Leibler (KL) divergence and the Bhattacharyya divergence, are tractable for simple distributions such as Gaussians, but are intractable for more complex distributions such as hidden Markov models (HMMs) used in speech recognizers. For tasks related to classification error, the Bhattacharyya divergence is of special importance, due to its relationship with the Bayes error. Here we derive novel variational approximations to the Bhattacharyya divergence for HMMs. Remarkably the variational Bhattacharyya divergence can be computed in a simple closed-form expression for a given sequence length. One of the approximations can even be integrated over all possible sequence lengths in a closed-form expression. We apply the variational Bhattacharyya divergence for HMMs to word confusability, the problem of estimating the probability of mistaking one spoken word for another.

[1]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[4]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Peder A. Olsen,et al.  Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[6]  George Saon,et al.  Minimum Bayes Error Feature Selection for Continuous Speech Recognition , 2000, NIPS.

[7]  John R. Hershey,et al.  Variational Kullback-Leibler divergence for Hidden Markov models , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[8]  John R. Hershey,et al.  Bhattacharyya error and divergence using variational importance sampling , 2007, INTERSPEECH.

[9]  Shrikanth S. Narayanan,et al.  Average divergence distance as a statistical discrimination measure for hidden Markov models , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[11]  John R. Hershey,et al.  Word confusability - measuring hidden Markov model similarity , 2007, INTERSPEECH.