Classification Using a Hierarchical Bayesian Approach

A key problem faced by classiJers is coping with styles not represented in the training set. We present an application of hierarchical Bayesian methods to the problem of recognizing degraded printed characters in a variety of fonts. The proposed method works by using training data of various styles and classes to compute prior distributions on the parameters for the class conditional distributions. For classification, the parameters for the actual class conditional distributions are fitted using an EM algorithm. The advantage of hierarchical Bayesian methods is motivated with a theoretical example. Severalfold increases in classification performance relative to style-oblivious and style-conscious are demonstrated on a multifont OCR task.

[1]  George Nagy,et al.  Self-correcting 100-font classifier , 1994, Electronic Imaging.

[2]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[3]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[4]  George Nagy,et al.  Style-consistency in isogenous patterns , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Thomas M. Breuel,et al.  Modeling the sample distribution for clustering OCR , 2000, IS&T/SPIE Electronic Imaging.

[6]  Tin Kam Ho,et al.  OCR with no shape training , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[7]  Thomas M. Breuel,et al.  Classification by probabilistic clustering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).