A Multi-Scale CRNN Model for Chinese Papery Medical Document Recognition

Paper-based medical documents are still widely used in many countries, while the contents within are difficult for patients to store and manage. In contrast, electronic medical documents not only help solve these problems, but also promote the development of telemedicine and medical big data. Thus, how to transform traditional printed medical documents into electronic ones becomes a key issue. It is worth noting that recognizing Chinese medical document in image form is a challenging task, as there are a variety of characters and symbols, including Greek alphabets, mathematical symbols and so on. The structure of Chinese characters is also often intricate. At present, the popular Optical Character Recognition methods are designed for single-scale characters, which tend to have poor performance in those complex scenarios. Based on Convolutional Recurrent Neural Network (CRNN), this paper proposes a multiscale architecture to recognize multi-lingual characters. To verify the effectiveness, the model is trained on a synthetic dataset and evaluated on a real Chinese medical document dataset. The experimental results demonstrate that the proposed method achieves substantial improvement over the recent methods.

[1]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[6]  Jason Poulos,et al.  Attention networks for image-to-text , 2017, ArXiv.

[7]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[12]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[14]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.