Font adaptation of an HMM-based OCR system

We create a polyfont OCR recognizer using HMM (Hidden Markov models) models of character trained on a dataset of various fonts. We compare this system to monofont recognizers showing its decrease of performance when it is used to recognize unseen fonts. In order to fill this gap of performance, we adapt the parameters of the models of the polyfont recognizer to a new dataset of unseen fonts using four different adaptation algorithms. The results of our experiments show that the adapted system is far more accurate than the initial system although it does not reach the accuracy of a monofont recognizer.

[1]  István Marosi Industrial OCR approaches: architecture, algorithms, and adaptation techniques , 2007, Electronic Imaging.

[2]  Henry S. Baird,et al.  Document image defect models , 1995 .

[3]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[4]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Horst Bunke,et al.  Hidden Markov model length optimization for handwriting recognition systems , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[6]  Samy Bengio,et al.  Writer adaptation techniques in HMM based Off-Line Cursive Script Recognition , 2002, Pattern Recognit. Lett..

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[8]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[9]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Richard M. Schwartz,et al.  A Script-Independent Methodology For Optical Character Recognition , 1998, Pattern Recognit..

[12]  Chafic Mokbel,et al.  Arabic handwriting recognition using baseline dependant features and hidden Markov modeling , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).