论文信息 - Robust and Accurate Text Stroke Segmentation

Robust and Accurate Text Stroke Segmentation

We propose a new technique for the accurate segmentation of text strokes from an image. The algorithm takes in a cropped image containing a word. It first performs a coarse segmentation using a Fully Convolutional Network (FCN). While not accurate, this initial segmentation can usually identify most of the text stroke content even in difficult situations, with uneven lighting and non-uniform background. The segmentation is then refined using a fully connected Conditional Random Field (CRF) with a novel kernel definition that includes stroke width information. In order to train the network, we created a new synthetic data set with 100K text images. Tested against standard benchmarks with pixellevel annotation (ICDAR 2003, ICDAR 2011, and SVT) our algorithm outperforms the state of the art by a noticeable margin.

[1] Roberto Manduchi,et al. Cascaded Segmentation-Detection Networks for Word-Level Text Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2] Jiri Matas,et al. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[3] Adam Finkelstein,et al. PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[4] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[5] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Alex D. Hwang,et al. An Augmented-Reality Edge Enhancement Application for Google Glass , 2014, Optometry and vision science : official publication of the American Academy of Optometry.

[7] Weilin Huang,et al. Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[8] Yonatan Wexler,et al. Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Marco Zennaro,et al. Large-scale privacy protection in Google Street View , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Nicholas R. Howe,et al. A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[11] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12] Tao Chen,et al. Scene text extraction based on edges and support vector regression , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[13] Jiri Matas,et al. On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14] Tatiana Novikova,et al. Image Binarization for End-to-End Text Understanding in Natural Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15] Shijian Lu,et al. Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[16] C. V. Jawahar,et al. An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[17] B. Kapralos,et al. I An Introduction to Digital Image Processing , 2022 .

[18] Lei Huang,et al. A Novel Method for Embedded Text Segmentation Based on Stroke and Color , 2011, 2011 International Conference on Document Analysis and Recognition.

[19] Andreas Dengel,et al. ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[20] Bernard Gosselin,et al. Color text extraction with selective metric-based clustering , 2007, Comput. Vis. Image Underst..

[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[23] Simon M. Lucas,et al. ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24] Jiřı́ Matas,et al. Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Roberto Manduchi,et al. A fast and robust text spotter , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26] Erik Learned-Miller,et al. Scene Text Recognition with Bilateral Regression , 2012 .

[27] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[28] Matti Pietikäinen,et al. Adaptive document image binarization , 2000, Pattern Recognit..

[29] Rui Wang,et al. Scene Text Segmentation via Inverse Rendering , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[30] Weilin Huang,et al. Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[31] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Jagath Samarabandu,et al. Multiscale Edge-Based Text Extraction from Complex Images , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[33] Shijian Lu,et al. Robust text segmentation using graph cut , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34] Matthew Turk,et al. TranslatAR: A mobile augmented reality translator , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[35] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[36] Deepak Kumar,et al. Benchmarking recognition results on camera captured word image data sets , 2012, DAR '12.

[37] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[39] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Shijian Lu,et al. Scene Text Segmentation with Multi-level Maximally Stable Extremal Regions , 2014, 2014 22nd International Conference on Pattern Recognition.