Rotation-Invariant Features for Multi-Oriented Text Detection in Natural Images

Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.

[1]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[2]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[4]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Weiqiang Wang,et al.  Robustly Extracting Captions in Videos Based on Stroke-Like Edges and Spatio-Temporal Analysis , 2012, IEEE Transactions on Multimedia.

[7]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[10]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[11]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Xian-Sheng Hua,et al.  An automatic performance evaluation protocol for video text detection algorithms , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[15]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[16]  Wumo Pan,et al.  Text detection from natural scene images using topographic maps and sparse representations , 2009, ICIP 2009.

[17]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[18]  Vladimir Pavlovic,et al.  Real-Time Vision for Human-Computer Interaction , 2010 .

[19]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[21]  Lina J. Karam,et al.  Morphological text extraction from images , 2000, IEEE Trans. Image Process..

[22]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[26]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[27]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[29]  Yi Ma,et al.  TILT: Transform Invariant Low-Rank Textures , 2010, ACCV 2010.

[30]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[31]  Yingli Tian,et al.  Localizing Text in Scene Images by Boundary Clustering, Stroke Segmentation, and String Fragment Classification , 2012, IEEE Transactions on Image Processing.

[32]  Edward J. Delp,et al.  A Low Complexity Sign Detection and Text Localization Method for Mobile Applications , 2011, IEEE Transactions on Multimedia.

[33]  Ming Zhao,et al.  Text detection in images using sparse representation with discriminative dictionaries , 2010, Image Vis. Comput..

[34]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[36]  Peter Peer,et al.  An improved edge profile based method for text detection in images of natural scenes , 2011, 2011 IEEE EUROCON - International Conference on Computer as a Tool.

[37]  Klaus Meyer-Wegener,et al.  NEOCR: A Configurable Dataset for Natural Image Text Recognition , 2011, CBDAR.

[38]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[39]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[40]  Yuxiao Hu,et al.  Text From Corners: A Novel Approach to Detect Text and Caption in Videos , 2011, IEEE Transactions on Image Processing.

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[43]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Jorge Stolfi,et al.  T-HOG: An effective gradient-based descriptor for single line text regions , 2013, Pattern Recognit..

[45]  Herbert Freeman,et al.  Determining the minimum-area encasing rectangle for an arbitrary closed curve , 1975, CACM.

[46]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[47]  A. McCallum,et al.  Sign detection in natural images with conditional random fields , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[48]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[49]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[50]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[51]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[52]  John Wright,et al.  RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Chucai Yi,et al.  Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[54]  Huizhong Chen,et al.  Mobile visual search on printed documents using text and low bit-rate features , 2011, 2011 18th IEEE International Conference on Image Processing.

[55]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[56]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[57]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[58]  Alberto Ruiz Affine alignment for stroke classification , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[59]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[61]  Wenyu Liu,et al.  Feature context for image classification and object detection , 2011, CVPR 2011.

[62]  Bernd Freisleben,et al.  Text detection in images based on unsupervised classification of high-frequency wavelet coefficients , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[63]  Satoshi Goto,et al.  A Contour-Based Robust Algorithm for Text Detection in Color Images , 2006, IEICE Trans. Inf. Syst..

[64]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[65]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[66]  Avinash C. Kak,et al.  Vision for Mobile Robot Navigation: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..