A Comprehensive Analysis of Misclassified Handwritten Chinese Character Samples by Incorporating Human Recognition

The development of convolutional neural networks (CNN) has led to revolutionary progress in the resolution of the offline handwritten Chinese character recognition (HCCR) problem. As the recognition rate on a standard offline HCCR testbed is outstanding, a few samples that remain misclassified have kindled our interest. In this paper, with the help of human recognition results, we present a comprehensive analysis of the samples misclassified by a state-of-the-art CNN model. We performed the analysis based on the top-1-votes, which are obtained from the statistical analysis of human recognition results, and derived the following conclusions: (1) the majority of samples with high top-1-votes were mis-labeled. Besides, by comparing the results of human recognition with that of CNN, some limitations of CNN that provide scope for further improvement are presented; (2) in the samples with medium top- 1-votes, it is shown that the samples with different confidence level have different characteristics. Specifically, some samples could be regarded as multi-label samples; (3) the samples with low top-1- votes are either wrongly written or written extensively in cursive style, which are difficult to match their given ground-truths; (4)the relationship between writing styles and misclassifications are also introduced in the paper. We believe this work should provide some insights and brings new clues on designing new classification methods to deal with these challenging samples.

[1]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Dan Ciresan,et al.  Multi-Column Deep Neural Networks for offline handwritten Chinese character classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3]  Lianwen Jin,et al.  Recognition confidence analysis of handwritten Chinese character with CNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4]  Lianwen Jin,et al.  Deformation Transformation for Handwritten Chinese Character Shape Correction , 2000, ICMI.

[5]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Jun Sun,et al.  Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition , 2017, Pattern Recognit..

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Lianwen Jin,et al.  High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[9]  Dai Ruwei,et al.  Chinese character recognition: history, status and prospects , 2007 .

[10]  Yoshua Bengio,et al.  Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark , 2016, Pattern Recognit..

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Satoshi Naoi,et al.  Beyond human recognition: A CNN-based framework for handwritten character recognition , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.