A Hybrid Model Reuse Training Approach for Multilingual OCR

Nowadays, there is a great demand for multilingual optical character recognition (MOCR) in various web applications. And recently, Long Short-Term Memory (LSTM) networks have yielded excellent results on Latin-based printed recognition. However, it is not flexible enough to cope with challenges posed by web applications where we need to quickly get an OCR model for a certain set of languages. This paper proposes a Hybrid Model Reuse (HMR) training approach for multilingual OCR task, based on 1D bidirectional LSTM networks coupled with a model reuse scheme. Specifically, Fixed Model Reuse (FMR) scheme is analyzed and incorporated into our approach, which implicitly grabs the useful discriminative information from a fixed text generating model. Moreover, LSTM layers from pre-trained networks for unilingual OCR task are reused to initialize the weights of target networks. Experimental results show that our proposed HMR approach, without assistance of any post-processing techniques, is able to effectively accelerate the training process and finally yield higher accuracy than traditional approaches.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Thomas M. Breuel,et al.  Can we build language-independent OCR using LSTM networks? , 2013, MOCR '13.

[3]  Minyong Shi,et al.  The prediction of character based on recurrent neural network language model , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[4]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[5]  Biao Yang,et al.  On Road Vehicle Detection Using an Improved Faster RCNN Framework with Small-Size Region Up-Scaling Strategy , 2017, PSIVT Workshops.

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Zhi-Hua Zhou,et al.  Deep Learning for Fixed Model Reuse , 2017, AAAI.

[8]  Paolo Merialdo,et al.  In Codice Ratio: OCR of Handwritten Latin Documents using Deep Convolutional Networks , 2017, AI*CH@AI*IA.

[9]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[11]  Zhi-Hua Zhou,et al.  Learnware: on the future of machine learning , 2016, Frontiers of Computer Science.

[12]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[13]  Venu Govindaraju,et al.  Multilingual OCR research and applications: an overview , 2013, MOCR '13.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  R. D. Sudhaker Samuel,et al.  A Novel Bilingual OCR System Based on Column-Stochastic Features and SVM Classifier for the Specially Enabled , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[17]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[18]  Jianfei Cai,et al.  Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Xing Xie,et al.  UniClip: Leveraging Web Search for Universal Clipping of Articles on Mobile , 2016, Data Science and Engineering.

[20]  Raymond Smith,et al.  Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..