A Fine-Grained Approach to Scene Text Script Identification

This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online.

[1]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Umapada Pal,et al.  Word-Wise Script Identification from Video Frames , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[4]  Feiyue Huang,et al.  Automatic script identification in the wild , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[5]  Palaiahnakote Shivakumara,et al.  New Gradient-Spatial-Structural Features for video script identification , 2015, Comput. Vis. Image Underst..

[6]  Deepak Kumar,et al.  Multi-script robust reading competition in ICDAR 2013 , 2013, MOCR '13.

[7]  Cheng-Lin Liu,et al.  Text Localization in Natural Scene Images Based on Conditional Random Field , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Dimosthenis Karatzas,et al.  A fast hierarchical method for multi-script and arbitrary oriented scene text extraction , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[10]  Dimosthenis Karatzas,et al.  An on-line platform for ground truthing and performance evaluation of text extraction systems , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[13]  Umapada Pal,et al.  ICDAR2015 Competition on Video Script Identification (CVSI 2015) , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[14]  Ranjith Unnikrishnan,et al.  Combined script and page orientation estimation using the Tesseract OCR engine , 2009, MOCR '09.

[15]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[16]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[17]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[23]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[24]  Jonathan Krause,et al.  Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[25]  Jin Hyung Kim,et al.  Scene Text Extraction with Edge Constraint and Text Collinearity , 2010, 2010 20th International Conference on Pattern Recognition.

[26]  B. Freisleben,et al.  Script recognition in images with complex backgrounds , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[27]  Marcus Liwicki,et al.  Sparse radial sampling LBP for writer identification , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[28]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[29]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[30]  Palaiahnakote Shivakumara,et al.  Gradient-Angular-Features for Word-wise Video Script Identification , 2014, 2014 22nd International Conference on Pattern Recognition.