T-HOG: An effective gradient-based descriptor for single line text regions

We discuss the use of histogram of oriented gradients (HOG) descriptors as an effective tool for text description and recognition. Specifically, we propose a HOG-based texture descriptor (T-HOG) that uses a partition of the image into overlapping horizontal cells with gradual boundaries, to characterize single-line texts in outdoor scenes. The input of our algorithm is a rectangular image presumed to contain a single line of text in Roman-like characters. The output is a relatively short descriptor that provides an effective input to an SVM classifier. Extensive experiments show that the T-HOG is more accurate than Dalal and Triggs's original HOG-based classifier, for any descriptor size. In addition, we show that the T-HOG is an effective tool for text/non-text discrimination and can be used in various text detection applications. In particular, combining T-HOG with a permissive bottom-up text detector is shown to outperform state-of-the-art text detection systems in two major publicly available databases.

[1]  Jorge Stolfi,et al.  Text detection and recognition in urban scenes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[3]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[4]  Cheng-Lin Liu,et al.  A Robust System to Detect and Localize Texts in Natural Scene Images , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[5]  Umapada Pal,et al.  Recent Advances in Video Based Document Processing: A Review , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[6]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Lei Huang,et al.  A New Block Partitioned Text Feature for Text Verification , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  Lionel Prevost,et al.  2009 10th International Conference on Document Analysis and Recognition Text Detection and Localization in Complex Scene Images using Constrained AdaBoost Algorithm , 2022 .

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Michael H. Brill,et al.  Color appearance models , 1998 .

[12]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[13]  Ming Zhao,et al.  Text detection in images using sparse representation with discriminative dictionaries , 2010, Image Vis. Comput..

[14]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[15]  Jorge Stolfi,et al.  Snoopertrack: Text detection and tracking for outdoor videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[16]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[17]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Palaiahnakote Shivakumara,et al.  Accurate video text detection through classification of low and high contrast images , 2010, Pattern Recognit..

[19]  G. Louloudisa,et al.  Text line detection in handwritten documents , 2008 .

[20]  Chucai Yi,et al.  Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[21]  Kye Kyung Kim,et al.  Scene text extraction in natural scene images using hierarchical feature combining and verification , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Ken Turkowski,et al.  Filters for common resampling tasks , 1990 .

[23]  Wei Zhang,et al.  Real-time Accurate Object Detection using Multiple Resolutions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[26]  Nicolas Thome,et al.  A cognitive and video-based approach for multinational License Plate Recognition , 2010, Machine Vision and Applications.

[27]  Matthieu Cord,et al.  An efficient system for combining complementary kernels in complex visual categorization tasks , 2010, 2010 IEEE International Conference on Image Processing.

[28]  GaoWen,et al.  Fast and robust text detection in images and video frames , 2005 .

[29]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[30]  Eleftherios Kayafas,et al.  Operator context scanning to support high segmentation rates for real time license plate recognition , 2010, Pattern Recognition.

[31]  Mahdi Nezamabadi,et al.  Color Appearance Models , 2014, J. Electronic Imaging.

[32]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..