论文信息 - Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images

Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images

In this paper, we introduce an “on the device” text line recognition framework that is designed for mobile or embedded systems. We consider per-character segmentation as a language-independent problem and individual character recognition as a language-dependent one. Thus, the proposed solution is based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN. To satisfy the tight constraints on memory size imposed by embedded systems and to avoid overfitting, we employ ANNs with a small number of trainable parameters. The primary purpose of our framework is the recognition of low-quality images of identity documents with complex backgrounds and a variety of languages and fonts. We demonstrate that our solution shows high recognition accuracy on natural datasets even being trained on purely synthetic data. We use MIDV-500 and Census 1961 Project datasets for text line recognition. The proposed method considerably surpasses the algorithmic method implemented in Tesseract 3.05, the LSTM method (Tesseract 4.00), and unpublished method used in the ABBYY FineReader 15 system. Also, our framework is faster than other compared solutions. We show the language-independence of our segmenter with the experiment with Cyrillic, Armenian, and Chinese text lines.

[1] Andrew Zisserman,et al. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[2] Shijian Lu,et al. Accurate Scene Text Recognition Based on Recurrent Neural Network , 2014, ACCV.

[3] Amjad Rehman,et al. Effects of artificially intelligent tools on pattern recognition , 2013, Int. J. Mach. Learn. Cybern..

[4] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5] Alexander V. Sheshkus,et al. Generation method of synthetic training data for mobile OCR system , 2018, International Conference on Machine Vision.

[6] Soonhoi Ha,et al. Joint optimization of speed, accuracy, and energy for embedded image recognition systems , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[8] Sébastien Eskenazi,et al. A comprehensive survey of mostly textual document segmentation algorithms since 2008 , 2017, Pattern Recognit..

[9] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Shumin Zhai,et al. Touch behavior with different postures on soft smartphone keyboards , 2012, Mobile HCI.

[11] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[12] Aníbal R. Figueiras-Vidal,et al. On improving CNNs performance: The case of MNIST , 2019, Inf. Fusion.

[13] Umapada Pal,et al. Multi-oriented touching text character segmentation in graphical documents using dynamic programming , 2012, Pattern Recognit..

[14] Rachid Oulad Haj Thami,et al. Squeeze-SegNet: a new fast deep convolutional neural network for semantic segmentation , 2017, International Conference on Machine Vision.

[15] Zhuowen Tu,et al. Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[16] Yong-Sheng Chen,et al. Batch-normalized Maxout Network in Network , 2015, ArXiv.

[17] Alexander Sheshkus,et al. Effective real-time augmentation of training dataset for the neural networks learning , 2019, International Conference on Machine Vision.

[18] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[19] Kurt Keutzer,et al. SqueezeNext: Hardware-Aware Neural Network Design , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20] Marcin Namysl,et al. Efficient, Lexicon-Free OCR using Deep Learning , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[21] Umapada Pal,et al. Multi-oriented Bangla and Devnagari text recognition , 2010, Pattern Recognit..

[22] Xiaolin Hu,et al. Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Apostolos Antonacopoulos,et al. Efficient and effective OCR engine training , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[25] Christopher Kermorvant,et al. Handwritten Text Line Segmentation Using Fully Convolutional Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[26] James A. Landay,et al. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones , 2016, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[27] Hui Gao,et al. Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition , 2016, Multimedia Tools and Applications.

[28] Tue Huu Huynh,et al. An independent character recognizer for distantly acquired mobile phone text images , 2016, 2016 International Conference on Advanced Technologies for Communications (ATC).

[29] Jean-Marc Ogier,et al. SmartDoc-QA: A dataset for quality assessment of smartphone captured document images - single and multiple distortions , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[30] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[31] Oldrich Kodym,et al. Brno Mobile OCR Dataset , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[32] Hassan El Bahi,et al. Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network , 2019, Multimedia Tools and Applications.

[33] Ke Wang,et al. AI Benchmark: Running Deep Neural Networks on Android Smartphones , 2018, ECCV Workshops.

[34] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[35] Thomas M. Breuel,et al. Can we build language-independent OCR using LSTM networks? , 2013, MOCR '13.

[36] Ehsan Adeli,et al. Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet , 2018, ArXiv.

[37] Vladimir L. Arlazarov,et al. Recognition of images of Korean characters using embedded networks , 2019, International Conference on Machine Vision.

[38] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Dmitry P. Nikolaev,et al. Smart IDReader: Document Recognition in Video Stream , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[40] Timofey S. Chernov,et al. MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream , 2018, Computer Optics.

[41] Y. Le Cun,et al. Shortest path segmentation: a method for training a neural network to recognize character strings , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[42] Yu Weng,et al. A New Deep Learning-Based Handwritten Character Recognition System on Mobile Computing Devices , 2019, Mob. Networks Appl..

[43] Palaiahnakote Shivakumara,et al. A New Gradient Based Character Segmentation Method for Video Text Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[44] Harris Drucker,et al. Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[45] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Mohammad Rouhani,et al. Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures , 2016, ArXiv.

[47] Andreas Dengel,et al. High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[48] Cesare Alippi,et al. Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case , 2018, 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[49] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[50] Shijian Lu,et al. Accurate recognition of words in scenes without character segmentation using recurrent neural network , 2017, Pattern Recognit..

[51] Kensuke Yokoi,et al. APAC: Augmented PAttern Classification with Neural Networks , 2015, ArXiv.

[52] Benjamin Graham,et al. Fractional Max-Pooling , 2014, ArXiv.

[53] Mickaël Coustaty,et al. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc) , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[54] Stephen Lane,et al. Cloud Chaser: real time deep learning computer vision on low computing power devices , 2019, International Conference on Machine Vision.

[55] Saibal Mukhopadhyay,et al. Efficient Object Detection Using Embedded Binarized Neural Networks , 2018, J. Signal Process. Syst..

[56] Liang Shan,et al. A new segmentation method for connected characters in CAPTCHA , 2015, 2015 International Conference on Control, Automation and Information Sciences (ICCAIS).

[57] Parul Sahare,et al. Multilingual Character Segmentation and Recognition Schemes for Indian Document Images , 2018, IEEE Access.

[58] Subhadip Basu,et al. Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter , 2019, Multimedia Tools and Applications.

[59] Changsong Liu,et al. Layout and Perspective Distortion Independent Recognition of Captured Chinese Document Image , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[60] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[61] Muhammad Muzzamil Luqman,et al. Mobile Phone Camera-Based Video Scanning of Paper Documents , 2013, CBDAR.

[62] Hamid R. Arabnia,et al. OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym , 2016, ISVC.

[63] Chunheng Wang,et al. Grayscale-Projection Based Optimal Character Segmentation for Camera-Captured Faint Text Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[64] Dmitry Nikolaev,et al. HoughNet: Neural Network Architecture for Vanishing Points Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[65] Alexander V. Sheshkus,et al. Optical font recognition in smartphone-captured images and its applicability for ID forgery detection , 2018, International Conference on Machine Vision.

[66] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.