Local Discriminant Training and Global Optimization for Convolutional Neural Network Based Handwritten Chinese Character Recognition

This paper investigates local discriminant training and global optimization methods for Convolutional Neural Network (CNN) to improve its discriminant ability and recognition accuracy. For local discriminant training, we propose to combine triplet loss and softmax with cross-entropy loss as the loss function. The triplet loss is incorporated into an additional fully-connected layer before the final fully-connected layer of a CNN model. For global optimization, we use Conditional Random Field (CRF) to further utilize the pairwise distance of the CNN feature vectors trained with triplet loss. Experiments with different CNN models on handwritten Chinese character samples show that the combined local discriminant training and global optimization scheme achieves better character recognition accuracy and confidence analysis performance.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[4]  Gong Cheng,et al.  RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[10]  Shengcai Liao,et al.  Embedding Deep Metric for Person Re-identification: A Study Against Large Variations , 2016, ECCV.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[15]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Ryuki Tachibana,et al.  Convolutional neural network pre-trained with projection matrices on linear discriminant analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Edward H. Adelson,et al.  Discovering states and transformations in image collections , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).