Towards visual words to words

We address the problem of text localization and retrieval in real world images. We are first to study the retrieval of text images, i.e. the selection of images containing text in large collections at high speed. We propose a novel representation, textual visual words, which describe text by generic visual words that geometrically consistently predict bottom and top lines of text. The visual words are discretized SIFT descriptors of Hessian features. The features may correspond to various structures present in the text - character fragments, individual characters or their arrangements. The textual words representation is invariant to affine transformation of the image and local linear change of intensity. Experiments demonstrate that the proposed method outperforms the state-of-the-art on the MS dataset. The proposed method detects blurry, small font, low contrast, noisy text from real world images.

[1]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[2]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[3]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[4]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[7]  Christof Koch,et al.  AdaBoost for Text Detection in Natural Scene , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Slawomir J. Nasuto,et al.  NAPSAC: High Noise, High Dimensional Robust Estimation - it's in the Bag , 2002, BMVC.

[9]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[10]  Cheng-Lin Liu,et al.  Text Localization in Natural Scene Images Based on Conditional Random Field , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[19]  Palaiahnakote Shivakumara,et al.  Detecting text in the real world , 2012, ACM Multimedia.

[20]  Xiaolei Huang,et al.  Computing layered surface representations: an algorithm for detecting and separating transparent overlays , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..