Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation

Detecting text in natural 3D scenes is a challenging problem due to background clutter and photometric/gemetric variations of scene text. Most prior systems adopt approaches based on deterministic rules, lacking a systematic and scalable framework. In this paper, we present a parts-based approach for 3D scene text detection using a higher-order MRF model. The higher-order structure is used to capture the spatial-feature relations among multiple parts in scene text. The use of higher-order structure and the feature-dependent potential function represents significant departure from the conventional pairwise MRF, which has been successfully applied in several low-level applications. We further develop a variational approximation method, in the form of belief propagation, for inference in the higher-order model. Our experiments using the ICDAR'03 benchmark showed promising results in detecting scene text with significant geometric variations, background clutter on planar surfaces or non-planar surfaces with limited angles.

[1]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[3]  Jiang Gao,et al.  An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[6]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[10]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[11]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[14]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[15]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[16]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..