Probability table compression using distributional clustering for scanning n-tuple classifiers

A method for compressing tables of probability distributions using distributional clustering is presented and applied to shrink the look-up tables of a scanning n-tuple handwritten character recognizer. Lossy compression is realized by clustering n-tuples that are observed to induce similar class probability distributions. A new distance metric called "weighted mean KL divergence" is introduced to assess similarity and account for the cumulative effect of merging two distributions. After compression, cluster membership is rebalanced in an annealing-like process. The proposed method is evaluated on three isolated-character subsets of the UNIPEN database. Compression ratios in excess of 2000:1 are demonstrated for 5-tuple classifiers.

[1]  Eugene H. Ratzlaff A scanning n-tuple classifier for online recognition of handwritten digits , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Michael C. Fairhurst,et al.  A new chain-code quantization approach enabling high performance handwriting recognition based on multi-classi .er schemes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[4]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[5]  Michael Perrone,et al.  EM mixture model probability table compression , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  Simon M. Lucas,et al.  Recognition of chain-coded handwritten character images with scanning n-tuple method , 1995 .