Real-Time Text Localization in Natural Scene Images Using a Linear Spatial Filter

This paper proposes a novel text localization method in natural images based on the connected components (CC) approach. First, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more global MLP classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the proposed method is its execution speed, being capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. In addition, it delivers competitive results interms of precision and recall on the ICDAR 2013 Robust Reading dataset.

[1]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[2]  Sittisak Rodtook,et al.  Adaptive thresholding of document images based on Laplacian sign , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[3]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  David Nistér,et al.  Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[5]  Yu-Bin Yang,et al.  Text detection based on convolutional neural networks with spatial pyramid pooling , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[6]  Hyung Jeong Yang,et al.  Automatic detection and recognition of Korean text in outdoor signboard images , 2010, Pattern Recognit. Lett..

[7]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[8]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[9]  Yingli Tian,et al.  Assistive Text Reading from Complex Background for Blind Persons , 2011, CBDAR.

[10]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[11]  Ignazio Gallo,et al.  Text Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions , 2014, ACCV Workshops.

[12]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[13]  Sartaj Sahni,et al.  Serial and parallel algorithms for the medial axis transform , 1992, Proceedings Sixth International Parallel Processing Symposium.

[14]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Derek Bradley,et al.  Adaptive Thresholding using the Integral Image , 2007, J. Graph. Tools.

[16]  Jun Sun,et al.  Text detection in natural scene images with user-intention , 2014, 2013 IEEE International Conference on Image Processing.

[17]  Hyung Il Koo,et al.  Scene Text Detection via Connected Component Clustering and Nontext Filtering , 2013, IEEE Transactions on Image Processing.

[18]  Tao Chen,et al.  Scene text extraction based on edges and support vector regression , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[19]  Haizhou Ai,et al.  Context-based text detection in natural scenes , 2012, 2012 19th IEEE International Conference on Image Processing.

[20]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[21]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Yingli Tian,et al.  Localizing Text in Scene Images by Boundary Clustering, Stroke Segmentation, and String Fragment Classification , 2012, IEEE Transactions on Image Processing.

[23]  Robert Sablatnig,et al.  End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[24]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[25]  Roberto Manduchi,et al.  A fast and robust text spotter , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Jiri Matas,et al.  Efficient Scene text localization and recognition with local character refinement , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[27]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[28]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yongdong Zhang,et al.  Real-Time Scene Text Detection Based on Stroke Model , 2014, 2014 22nd International Conference on Pattern Recognition.

[30]  Yunde Jia,et al.  Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images , 2008, Pattern Recognit..

[31]  Séverine Dubuisson,et al.  TextCatcher: a method to detect curved and challenging text in natural scenes , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[32]  Jian-Huang Lai,et al.  Arbitrarily oriented text detection using geodesic distances between corners and skeletons , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[33]  Jiri Matas,et al.  FASText: Efficient Unconstrained Scene Text Detector , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Weiqiang Wang,et al.  Extracting Captions in Complex Background from Videos , 2010, 2010 20th International Conference on Pattern Recognition.

[35]  Jin Hyung Kim,et al.  Scene Text Extraction with Edge Constraint and Text Collinearity , 2010, 2010 20th International Conference on Pattern Recognition.

[36]  Qingming Huang,et al.  A configurable method for multi-style license plate recognition , 2009, Pattern Recognit..

[37]  Shijian Lu,et al.  Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Nicole Vincent,et al.  Comparison of Niblack inspired binarization methods for ancient documents , 2009, Electronic Imaging.

[39]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[40]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[41]  Masatoshi Kimachi,et al.  Using Adaboost to Detect and Segment Characters from Natural Scenes , 2005 .

[42]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[43]  Maxwell Jaderberg Deep learning for text spotting , 2015 .

[44]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[45]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .