Deep Neural Networks Combined with STN for Multi-Oriented Text Detection and Recognition

Developing systems for interpreting visuals, such as images, videos is really challenging but important task to be developed and applied on benchmark datasets. This study solves the very challenge by using STN-OCR model consisting of deep neural networks (DNN) and Spatial Transformer Networks (STNs). The network architecture of this study consists of two stages: localization network and recognition network. In the localization network it finds and localizes text regions and generates sampling grid. Whereas, in the recognition network, text regions will be input and then this network learns to recognize text including low resolution, curved and multi-oriented text. Deep learning-based approaches require a lot of data for training effectively, therefore, this study has used two benchmark datasets, Street View House Numbers (SVHN) and International Conference on Document Analysis and Recognition (ICDAR) 2015 to evaluate the system. The STN-OCR model achieves better results than literature on these datasets.

[1]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[2]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[4]  Hui Wu,et al.  Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy , 2015, The Visual Computer.

[5]  Ioannis Pratikakis,et al.  Detection of artificial and scene text in images and video frames , 2013, Pattern Analysis and Applications.

[6]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[8]  Chucai Yi,et al.  Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[9]  Wei Shen,et al.  Text detection in scene images based on exhaustive segmentation , 2017, Signal Process. Image Commun..

[10]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[11]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Shijian Lu,et al.  Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping , 2018, ECCV.

[14]  Jie Liu,et al.  A cascaded method for text detection in natural scene images , 2017, Neurocomputing.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[17]  Abdul Rehman Gilal,et al.  Efficient Edge-Based Image Interpolation Method Using Neighboring Slope Information , 2019, IEEE Access.

[18]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[19]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[22]  Shuib Basri,et al.  Finding an effective classification technique to develop a software team composition model , 2017, J. Softw. Evol. Process..

[23]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[24]  Peter I. Corke,et al.  Using text-spotting to query the world , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Umapada Pal,et al.  Multi-oriented text detection and verification in video frames and scene images , 2017, Neurocomputing.

[28]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Christoph Meinel,et al.  SEE: Towards Semi-Supervised End-to-End Scene Text Recognition , 2017, AAAI.

[30]  Shuib Basri,et al.  A rule-based model for software development team composition: Team leader role with personality types and gender classification , 2016, Inf. Softw. Technol..

[31]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[35]  Till Quack,et al.  Large scale mining and retrieval of visual data in a multimodal context , 2008 .

[36]  Abdullah Alshanqiti,et al.  Intelligent Parallel Mixed Method Approach for Characterising Viral YouTube Videos in Saudi Arabia , 2020 .

[37]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  M. S. Panse,et al.  Character detection and recognition system for visually impaired people , 2016, 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT).

[39]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[40]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[41]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.