论文信息 - A Hybrid Model Reuse Training Approach for Multilingual OCR

A Hybrid Model Reuse Training Approach for Multilingual OCR

Nowadays, there is a great demand for multilingual optical character recognition (MOCR) in various web applications. And recently, Long Short-Term Memory (LSTM) networks have yielded excellent results on Latin-based printed recognition. However, it is not flexible enough to cope with challenges posed by web applications where we need to quickly get an OCR model for a certain set of languages. This paper proposes a Hybrid Model Reuse (HMR) training approach for multilingual OCR task, based on 1D bidirectional LSTM networks coupled with a model reuse scheme. Specifically, Fixed Model Reuse (FMR) scheme is analyzed and incorporated into our approach, which implicitly grabs the useful discriminative information from a fixed text generating model. Moreover, LSTM layers from pre-trained networks for unilingual OCR task are reused to initialize the weights of target networks. Experimental results show that our proposed HMR approach, without assistance of any post-processing techniques, is able to effectively accelerate the training process and finally yield higher accuracy than traditional approaches.

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Thomas M. Breuel,et al. Can we build language-independent OCR using LSTM networks? , 2013, MOCR '13.

[3] Minyong Shi,et al. The prediction of character based on recurrent neural network language model , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[4] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[5] Biao Yang,et al. On Road Vehicle Detection Using an Improved Faster RCNN Framework with Small-Size Region Up-Scaling Strategy , 2017, PSIVT Workshops.

[6] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7] Zhi-Hua Zhou,et al. Deep Learning for Fixed Model Reuse , 2017, AAAI.

[8] Paolo Merialdo,et al. In Codice Ratio: OCR of Handwritten Latin Documents using Deep Convolutional Networks , 2017, AI*CH@AI*IA.

[9] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[11] Zhi-Hua Zhou,et al. Learnware: on the future of machine learning , 2016, Frontiers of Computer Science.

[12] Henry S. Baird,et al. Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[13] Venu Govindaraju,et al. Multilingual OCR research and applications: an overview , 2013, MOCR '13.

[14] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] R. D. Sudhaker Samuel,et al. A Novel Bilingual OCR System Based on Column-Stochastic Features and SVM Classifier for the Specially Enabled , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[17] Thomas M. Breuel,et al. The OCRopus open source OCR system , 2008, Electronic Imaging.

[18] Jianfei Cai,et al. Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19] Xing Xie,et al. UniClip: Leveraging Web Search for Universal Clipping of Articles on Mobile , 2016, Data Science and Engineering.

[20] Raymond Smith,et al. Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.

[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..