Handwritten word preprocessing for database adaptation

Handwriting recognition systems are typically trained using publicly available databases, where data have been collected in controlled conditions (image resolution, paper background, noise level,...). Since this is not often the case in real-world scenarios, classification performance can be affected when novel data is presented to the word recognition system. To overcome this problem, we present in this paper a new approach called database adaptation. It consists of processing one set (training or test) in order to adapt it to the other set (test or training, respectively). Specifically, two kinds of preprocessing, namely stroke thickness normalization and pixel intensity normalization are considered. The advantage of such approach is that we can re-use the existing recognition system trained on controlled data. We conduct several experiments with the Rimes 2011 word database and with a real-world database. We adapt either the test set or the training set. Results show that training set adaptation achieves better results than test set adaptation, at the cost of a second training stage on the adapted data. Accuracy of data set adaptation is increased by 2% to 3% in absolute value over no adaptation.

[1]  Chafic Mokbel,et al.  Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  D. P. D'Amato,et al.  RESULTS FROM A PERFORMANCE EVALUATION OF HANDWRITTEN ADDRESS RECOGNITION SYSTEMS FOR THE UNITED STATES POSTAL SERVICE , 2004 .

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Chafic Mokbel,et al.  Arabic handwriting recognition using baseline dependant features and hidden Markov modeling , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[5]  Haikal El Abed,et al.  ICDAR 2011 - French Handwriting Recognition Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[7]  Emmanuel Augustin,et al.  A2iA Check Reader: a family of bank check recognition systems , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[8]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[9]  Chin-Hui Lee,et al.  MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[10]  Volkmar Frinken,et al.  Self-training Strategies for Handwriting Word Recognition , 2009, ICDM.

[11]  Anne-Laure Bianne Bernard Reconnaissance de mots manuscrits cursifs par modèles de Markov cachés en contexte : application au français, à l'anglais et à l'arabe , 2011 .

[12]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .