A Method of Synthesizing Handwritten Chinese Images for Data Augmentation

The performance of printed document recognition has been significantly improved by generating synthetic images to augment the training data, particularly by providing more variability in the linguistic contents. Handwriting recognition benefits less from this data augmentation and the only variability that is usually added is via artificially generated combinations of skew, slant and noise. Generating handwritten text is complex due to variations in form, scale and spatial placement of the characters, and can be further complicated by the cursive aspects of the script. We propose a novel strategy, in the particular case of Chinese characters, to generate synthetic lines of text, given samples of the isolated characters. The well-known CASIA database is used to train MDLSTM-RNN models and also in the creation of synthetic line images. On an independent set of real-world images, a model trained only on synthetic images achieved a small relative reduction of 4.4% in the character error rate with respect to a baseline model trained exclusively on real images, while training on a combination of real and synthetic images resulted in a appreciable reduction of 10.4%.

[1]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[2]  Cheng-Lin Liu,et al.  CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[3]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  Tapani Raiko,et al.  Iterative Neural Autoregressive Distribution Estimator NADE-k , 2014, NIPS.

[6]  Jérôme Louradour,et al.  Segmentation-free handwritten Chinese text recognition with LSTM-RNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[7]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[8]  Nam Ik Cho,et al.  Language-Independent Text-Line Extraction Algorithm for Handwritten Documents , 2014, IEEE Signal Processing Letters.

[9]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[10]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[12]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[14]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[15]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[16]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[17]  Henry S. Baird,et al.  Document image defect models , 1995 .

[18]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[19]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[20]  Hsi-Jian Lee,et al.  Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm , 1999, Pattern Recognit. Lett..

[21]  Robert P. Loce,et al.  Modeling vibration-induced halftone banding in a xerographic laser printer , 1995, J. Electronic Imaging.

[22]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.