EM mixture model probability table compression

This paper presents a new probability table compression method based on mixture models applied to N-tuple recognizers. Joint probability tables are modeled by lower dimensional probability mixtures and their mixture coefficients. The maximum likelihood parameters of the mixture models are trained by the expectation-maximization (EM) algorithm and quantized to one byte integers. The probability elements which mixture models do not estimate reliably are kept separately. Experimental results with on-line handwritten UNIPEN digits show that the total memory size of an N-tuple recognizer is reduced from 11.8 Mbytes to 0.55 Mbytes, while the recognition rate drops from 97.7% to 97.5%.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Bhiksha Raj,et al.  Quantization-based language model compression , 2001, INTERSPEECH.

[3]  Simon M. Lucas,et al.  Recognition of chain-coded handwritten character images with scanning n-tuple method , 1995 .

[4]  Eugene H. Ratzlaff A scanning n-tuple classifier for online recognition of handwritten digits , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).