论文信息 - Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation

Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation

Detecting text in natural 3D scenes is a challenging problem due to background clutter and photometric/gemetric variations of scene text. Most prior systems adopt approaches based on deterministic rules, lacking a systematic and scalable framework. In this paper, we present a parts-based approach for 3D scene text detection using a higher-order MRF model. The higher-order structure is used to capture the spatial-feature relations among multiple parts in scene text. The use of higher-order structure and the feature-dependent potential function represents significant departure from the conventional pairwise MRF, which has been successfully applied in several low-level applications. We further develop a variational approximation method, in the form of belief propagation, for inference in the higher-order model. Our experiments using the ICDAR'03 benchmark showed promising results in detecting scene text with significant geometric variations, background clutter on planar surfaces or non-planar surfaces with limited angles.

Shih-Fu Chang | Dong-Qing Zhang

[1] Anil K. Jain,et al. Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2] William T. Freeman,et al. Understanding belief propagation and its generalizations , 2003 .

[3] Jiang Gao,et al. An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4] William T. Freeman,et al. Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5] Pietro Perona,et al. A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[6] Yang Song,et al. Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7] Simon M. Lucas,et al. ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8] Chitra Dorai,et al. Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9] Jianbo Shi,et al. Object-Specific Figure-Ground Segregation , 2003, CVPR.

[10] David S. Doermann,et al. Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[11] Daniel P. Huttenlocher,et al. Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12] Dorin Comaniciu,et al. Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Nevenka Dimitrova,et al. Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[14] William T. Freeman,et al. Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[15] David A. Forsyth,et al. Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[16] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..