Glyph-Based Data Augmentation for Accurate Kanji Character Recognition

In this paper, we address a problem of data augmentation for character recognition. Particularly, we focus on incorporating variation in glyph into data augmentation of character images, which is a simple approach for data augmentation. Generally, existing methods increase data size by distorting images, whereas the proposed method applies noise injection into glyphs, resulting in data with radical variation in glyph. The proposed method exploits public database of glyphs for kanji and augments glyphs by injecting noise into glyphs. Then, we generate images of kanji automatically by deploying stroke images on the augmented glyphs. We carried out experiments for kanji character recognition using augmented data. The results show the effectiveness of the proposed method.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Réjean Plamondon,et al.  An interactive system for the automatic generation of huge handwriting databases from a few specimens , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Ronald N. Perry,et al.  An improved representation for stroke-based fonts , 2006, SIGGRAPH '06.

[4]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[6]  Yanning Zhang,et al.  An Efficient Physically-Based Model for Chinese Brush , 2007, FAW.

[7]  Harold Mouchère,et al.  Learning a Classifier with Very Few Examples: Analogy Based and Knowledge Based Generation of New Examples for Character Recognition , 2007, ECML.

[8]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[9]  Donald Ervin Knuth,et al.  The METAFONTbook , 1986 .

[10]  Lorenzo L. Pesce,et al.  Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. , 2009, Medical physics.

[11]  Roger D. Hersch,et al.  Next generation typeface representations: revisiting parametric fonts , 2010, DocEng '10.

[12]  Yeung Yam,et al.  Genetic Algorithm-Based Brush Stroke Generation for Replication of Chinese Calligraphic Character , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[13]  Roger D. Hersch,et al.  Parameterizable Fonts Based on Shape Components , 2001, IEEE Computer Graphics and Applications.