Natural Scene Character Recognition Without Dependency on Specific Features

Current methods in scene character recognition heavily rely on discriminative power of local features, such as HoG, SIFT, Shape Contexts (SC), Geometric Blur (GB), etc. One of the problems with this approach is that the local features are rasterized in an ad hoc manner into a single vector perturbing thus spatial correlations that carry crucial information. To eliminate this feature dependency and associated problems, we propose a holistic solution as follows: For each character to be recognized, we stack a set of training images to form a 3-mode tensor. Each training tensor is then decomposed into a linear superposition of ‘k’ rank-1 matrices, whereby the rank-1 matrices form a basis, spanning solution subspace of the character class. For a test image to be classified, we obtain projections onto the pre-computed rank-1 bases of each class, and recognize it as the class for which inner-product of mixing vectors is maximized. We use challenging natural scene character datasets, namely Chars74K, ICDAR2003, and SVT-CHAR. We achieve results better than several baseline methods based on local features (e.g. HoG) and show leave-random-one-out-cross validation yield even better recognition performance, justifying thus our intuition of the importance of featureindependency and preservation of spatial correlations in recognition.

[1]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[2]  Silke Wagner,et al.  Using web search engines to improve text recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[4]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[7]  Tamir Hazan,et al.  Sparse image coding using a 3D non-negative tensor factorization , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Hassan Foroosh,et al.  Action recognition using rank-1 approximation of Joint Self-Similarity Volume , 2011, 2011 International Conference on Computer Vision.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[12]  C. V. Jawahar,et al.  An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Klaus Meyer-Wegener,et al.  NEOCR: A Configurable Dataset for Natural Image Text Recognition , 2011, CBDAR.

[14]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[15]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[16]  Toru Wakahara,et al.  Binarization of Color Characters in Scene Images Using k-means Clustering and Support Vector Machines , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[18]  Amnon Shashua,et al.  Linear image coding for regression and classification using the tensor-rank principle , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Toru Wakahara,et al.  Segmentation and recognition of characters in scene images using selective binarization in color space and GAT correlation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Erik G. Learned-Miller,et al.  Improving Open-Vocabulary Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[21]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Nikos D. Sidiropoulos,et al.  Cramer-Rao lower bounds for low-rank decomposition of multidimensional arrays , 2001, IEEE Trans. Signal Process..