Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation

Scene text recognition (STR) is the recognition of text anywhere in the environment, such as signs and storefronts. Relative to document recognition, it is challenging because of font variability, minimal language context, and uncontrolled conditions. Much information available to solve this problem is frequently ignored or used sequentially. Similarity between character images is often overlooked as useful information. Because of language priors, a recognizer may assign different labels to identical characters. Directly comparing characters to each other, rather than only a model, helps ensure that similar instances receive the same label. Lexicons improve recognition accuracy but are used post hoc. We introduce a probabilistic model for STR that integrates similarity, language properties, and lexical decision. Inference is accelerated with sparse belief propagation, a bottom-up method for shortening messages by reducing the dependency between weakly supported hypotheses. By fusing information sources in one model, we eliminate unrecoverable errors that result from sequential processing, improving accuracy. In experimental results recognizing text from images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19 percent, the lexicon reduces word recognition error by 35 percent, and sparse belief propagation reduces the lexicon words considered by 99.9 percent with a 12X speedup and no loss in accuracy.

[1]  W. W. Bledsoe,et al.  Pattern recognition and reading by machine , 1959, IRE-AIEE-ACM '59 (Eastern).

[2]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[3]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[4]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[5]  Yinglin Yu,et al.  Handwritten Chinese character recognition using spatial Gabor filters and self-organizing feature maps , 1994, Proceedings of 1st International Conference on Image Processing.

[6]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[7]  Jonathan J. Hull,et al.  Improving ocr performance with word image equivalence , 1995 .

[8]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Tin Kam Ho,et al.  Enhancing degraded document images via bitmap clustering and averaging , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[10]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[11]  Thomas M. Breuel,et al.  Classification by probabilistic clustering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Simon M. Lucas,et al.  Fast lexicon-based word recognition in noisy index card images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[14]  Shih-Fu Chang,et al.  A Bayesian framework for fusing multiple word knowledge models in videotext recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Thomas M. Breuel Character recognition by adaptive statistical similarity , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[16]  Joshua Goodman,et al.  Exponential Priors for Maximum Entropy Models , 2004, NAACL.

[17]  Xilin Chen,et al.  Automatic detection and recognition of signs from natural scenes , 2004, IEEE Transactions on Image Processing.

[18]  Marc-Peter Schambach Fast script word recognition with very large vocabulary , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[19]  Bernard Gosselin,et al.  An Embedded Application for Degraded Text Recognition , 2005, EURASIP J. Adv. Signal Process..

[20]  B. Kapralos,et al.  I An Introduction to Digital Image Processing , 2022 .

[21]  Paul A. Viola,et al.  Text recognition of low-resolution document images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[22]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[23]  Erik G. Learned-Miller,et al.  Improving Recognition of Novel Input with Similarity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Christopher Joseph Pal,et al.  Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  James M. Coughlan,et al.  Dynamic quantization for belief propagation in sparse spaces , 2007, Comput. Vis. Image Underst..

[26]  Fast Lexicon-Based Scene Text Recognition with Sparse Belief Propagation , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[27]  Céline Mancas-Thillou,et al.  A Weighted Finite-State Framework for Correcting Errors in Natural Scene OCR , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).