论文信息 - A Machine Learning Approach to Hypothesis Decoding in Scene Text Recognition

A Machine Learning Approach to Hypothesis Decoding in Scene Text Recognition

Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

[1] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[2] Chunheng Wang,et al. Scene Text Recognition Using Part-Based Tree-Structured Character Detection , 2013, CVPR 2013.

[3] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[4] Jeff A. Bilmes. Graphical models and automatic speech recognition , 2002 .

[5] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[6] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[7] Jun Guo,et al. Text extraction from natural scene image: A survey , 2013, Neurocomputing.

[8] Hartmut Neven,et al. PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[9] Palaiahnakote Shivakumara,et al. HMM-Based Multi Oriented Text Recognition in Natural Scene Image , 2013, 2013 2nd IAPR Asian Conference on Pattern Recognition.

[10] Sanjeev Khudanpur,et al. WEB-derived pronunciations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Jiřı́ Matas,et al. Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13] Jon Almazán,et al. ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14] Jiri Matas,et al. On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15] Jerod J. Weinman,et al. Toward Integrated Scene Text Reading , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Robert P. Sheridan,et al. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[17] Tatiana Novikova,et al. Large-Lexicon Attribute-Consistent Text Recognition in Natural Images , 2012, ECCV.

[18] Jeff A. Bilmes,et al. Graphical models and automatic speech recognition , 2002 .

[19] C. V. Jawahar,et al. Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[21] Jacqueline L. Feild,et al. Improving Text Recognition in Images of Natural Scenes , 2014 .