Robust and Accurate Text Stroke Segmentation

We propose a new technique for the accurate segmentation of text strokes from an image. The algorithm takes in a cropped image containing a word. It first performs a coarse segmentation using a Fully Convolutional Network (FCN). While not accurate, this initial segmentation can usually identify most of the text stroke content even in difficult situations, with uneven lighting and non-uniform background. The segmentation is then refined using a fully connected Conditional Random Field (CRF) with a novel kernel definition that includes stroke width information. In order to train the network, we created a new synthetic data set with 100K text images. Tested against standard benchmarks with pixellevel annotation (ICDAR 2003, ICDAR 2011, and SVT) our algorithm outperforms the state of the art by a noticeable margin.

[1]  Roberto Manduchi,et al.  Cascaded Segmentation-Detection Networks for Word-Level Text Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[3]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alex D. Hwang,et al.  An Augmented-Reality Edge Enhancement Application for Google Glass , 2014, Optometry and vision science : official publication of the American Academy of Optometry.

[7]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Marco Zennaro,et al.  Large-scale privacy protection in Google Street View , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Tao Chen,et al.  Scene text extraction based on edges and support vector regression , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[13]  Jiri Matas,et al.  On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Tatiana Novikova,et al.  Image Binarization for End-to-End Text Understanding in Natural Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[16]  C. V. Jawahar,et al.  An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  B. Kapralos,et al.  I An Introduction to Digital Image Processing , 2022 .

[18]  Lei Huang,et al.  A Novel Method for Embedded Text Segmentation Based on Stroke and Color , 2011, 2011 International Conference on Document Analysis and Recognition.

[19]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Bernard Gosselin,et al.  Color text extraction with selective metric-based clustering , 2007, Comput. Vis. Image Underst..

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[23]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Roberto Manduchi,et al.  A fast and robust text spotter , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Erik Learned-Miller,et al.  Scene Text Recognition with Bilateral Regression , 2012 .

[27]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[28]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[29]  Rui Wang,et al.  Scene Text Segmentation via Inverse Rendering , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[30]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[31]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jagath Samarabandu,et al.  Multiscale Edge-Based Text Extraction from Complex Images , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[33]  Shijian Lu,et al.  Robust text segmentation using graph cut , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34]  Matthew Turk,et al.  TranslatAR: A mobile augmented reality translator , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[35]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[36]  Deepak Kumar,et al.  Benchmarking recognition results on camera captured word image data sets , 2012, DAR '12.

[37]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[39]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Shijian Lu,et al.  Scene Text Segmentation with Multi-level Maximally Stable Extremal Regions , 2014, 2014 22nd International Conference on Pattern Recognition.