A New Method for Text Verification Based on Random Forests

Text in image or video frames contains a lot of high-level semantics which can be useful for multimedia indexing, management. Coarse text detection results may contain many false alarms, which makes it necessary to eliminate the false alarms for further recognition. As text has distinct textural features, texture-based classifier such as SVM, MLP and Adaboost has been used to classify the detection regions as text or non-text region. In this paper, a random forests based method for text verification is proposed. The reason of choosing random forests lies in: 1) its ability of maintaining accuracy in small labeled dataset and 2) its good performance in unbalanced dataset as in the case of unbalanced text and non-text distribution. Furthermore, we propose to merge different random forests trained with different kinds of features to improve the accuracy of classification. The comprehensive experimental results show that our methods are effective.

[1]  Jing Zhang,et al.  Extraction of Text Objects in Video Documents: Recent Progress , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[2]  Jing Zhang,et al.  A new edge-based text verification approach for video , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Lei Huang,et al.  A New Block Partitioned Text Feature for Text Verification , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[7]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[8]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[10]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Lionel Prevost,et al.  A cascade detector for text detection in natural scene images , 2008, 2008 19th International Conference on Pattern Recognition.

[12]  Qifeng Liu,et al.  Accurate text localization in images based on SVM output scores , 2009, Image Vis. Comput..

[13]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[14]  Antonio Criminisi,et al.  Object Class Recognition at a Glance , 2006 .

[15]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[16]  Cheng-Lin Liu,et al.  A Robust System to Detect and Localize Texts in Natural Scene Images , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.