A handwritten Chinese characters recognition method based on sample set expansion and CNN

Convolutional neural networks (CNN) is a powerful technology for classification of visual inputs. However, both the scale and quality of the training set are an important factor to the performance of a learned system. In real applications, it is generally difficult to obtain a high-quality and large-scale handwritten Chinese characters sample set. Insufficient samples of handwritten Chinese characters would cause poor recognition performance. In this paper, we propose a handwritten Chinese character recognition method based on dataset expansion and CNNs. Firstly, the topology of proposed Convolutional neural networks model is addressed. Then, several dataset expansion techniques are utilized to expand the scale of available samples, which include random elastic deformation, shear transformation and rotation within a small range, etc. A series of experiments on the HCL2000 Chinese character handwriting database have shown that our method can effectively improve the recognition performance, with a reduction in error rate of 35.01%, verified the effectiveness of our proposed approach.

[1]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[2]  Fei Yin,et al.  Chinese Handwriting Recognition Contest 2010 , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[3]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  Jin Lianwen,et al.  Recognition of Chinese characters based on multi-scale gradient and deep neural network , 2015 .

[5]  Xue Gao,et al.  Dimensionality Reduction by Locally Linear Discriminant Analysis for Handwritten Chinese Character Recognition , 2012, IEICE Trans. Inf. Syst..

[6]  Toru Wakahara,et al.  Toward robust handwritten Kanji character recognition , 1999, Pattern Recognit. Lett..

[7]  Ching Y. Suen,et al.  A trainable feature extractor for handwritten digit recognition , 2007, Pattern Recognit..

[8]  Honggang Zhang,et al.  2009 10th International Conference on Document Analysis and Recognition HCL2000—A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition , 2022 .

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .