Scene Text Segmentation with Multi-level Maximally Stable Extremal Regions

The segmentation of scene text from the image background has shown great importance in scene text recognition. In this paper, we propose a multi-level MSER technology that identifies the best-quality text candidates from a set of stable regions that are extracted from different color channel images. In order to identify the best-quality text candidates, a segmentation score is defined which exploits four measures to evaluate the text probability of each stable region including: 1) Stroke width that measures the small stroke width variation of the text, 2) Boundary curvature that measures the smoothness of the stable region boundary, 3) Character confidence that measures the likelihood of a stable region being text based on a pre-trained support vector classifier, 4) Color constancy that measures the global color consistency of each selected text candidate. Finally, the MSERs with the best segmentation score from each channel are combined to form the final segmentation. The proposed method is evaluated on the ICDAR2003 and SVT datasets and experiments show that it outperforms both popular document image binarization methods and state of the art scene text segmentation methods.

[1]  Jiri Matas,et al.  On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[2]  Tatiana Novikova,et al.  Image Binarization for End-to-End Text Understanding in Natural Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Palaiahnakote Shivakumara,et al.  Scene Character Reconstruction through Medial Axis , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Lei Huang,et al.  A Novel Method for Embedded Text Segmentation Based on Stroke and Color , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Erik G. Learned-Miller,et al.  Improving Open-Vocabulary Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[7]  C. V. Jawahar,et al.  An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Cristina Urdiales,et al.  2D object recognition based on curvature functions obtained from local histograms of the contour chain code , 1999, Pattern Recognit. Lett..

[9]  Chunheng Wang,et al.  Scene text detection using graph model built upon maximally stable extremal regions , 2013, Pattern Recognit. Lett..

[10]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[13]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[14]  Hossein Mobahi,et al.  Natural Image Segmentation with Adaptive Texture and Boundary Encoding , 2009, ACCV.

[15]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[16]  Jagath Samarabandu,et al.  Multiscale Edge-Based Text Extraction from Complex Images , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[17]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[20]  Bernard Gosselin,et al.  Color text extraction with selective metric-based clustering , 2007, Comput. Vis. Image Underst..

[21]  Josep Lladós,et al.  A framework for the assessment of text extraction algorithms on complex colour images , 2010, DAS '10.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[24]  Deepak Kumar,et al.  Benchmarking recognition results on camera captured word image data sets , 2012, DAR '12.

[25]  政子 鶴岡,et al.  1998 IEEE International Conference on SMCに参加して , 1998 .

[26]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[27]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[28]  Rui Wang,et al.  Scene Text Segmentation via Inverse Rendering , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[29]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.