Finding Text in Natural Scenes by Figure-Ground Segmentation

Much past research on finding text in natural scenes uses bottom-up grouping processes to detect candidate text features as a first processing step. While such grouping procedures are a fast and efficient way of extracting the parts of an image that are most likely to contain text, they still suffer from large amounts of false positives that must be pruned out before they can be read by OCR. We argue that a natural framework for pruning out false positive text features is figure-ground segmentation. This process is implemented using a graphical model (i.e. MRF) in which each candidate text feature is represented by a node. Since each node has only two possible states (figure and ground), and since the connectivity of the graphical model is sparse, we can perform rapid inference on the graph using belief propagation. We show promising results on a variety of urban and indoor scene images containing signs, demonstrating the feasibility of the approach

[1]  Tomer Hertz,et al.  Pairwise Clustering and Graphical Models , 2003, NIPS.

[2]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[3]  Jiang Gao,et al.  An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Shih-Fu Chang,et al.  Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[5]  James M. Coughlan,et al.  Reading LCD/LED Displays with a Camera Cell Phone , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[6]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  A. Yuille Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[8]  W. Freeman,et al.  Bethe free energy, Kikuchi approximations, and belief propagation algorithms , 2001 .

[9]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[11]  Shi,et al.  A Fast Algorithm for Finding Crosswalks using Figure-Ground Segmentation , 2006 .

[12]  David S. Doermann,et al.  Text identification in noisy document images using Markov random model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Xilin Chen,et al.  Automatic detection of signs with affine transformation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[14]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[15]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[18]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[19]  Eric Mjolsness,et al.  New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence , 1998, NIPS.