Temporal Integration for Word-Wise Caption and Scene Text Identification

Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.

[1]  Palaiahnakote Shivakumara,et al.  New Tampered Features for Scene and Caption Text Classification in Video Frame , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[2]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Palaiahnakote Shivakumara,et al.  Graphics and Scene Text Classification in Video , 2014, 2014 22nd International Conference on Pattern Recognition.

[4]  Chew Lim Tan,et al.  Bayesian classifier for multi-oriented video text recognition system , 2015, Expert Syst. Appl..

[5]  Yunhong Wang,et al.  Random Projected Convolutional Feature for Scene Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  Wei Shen,et al.  Text detection in scene images based on exhaustive segmentation , 2017, Signal Process. Image Commun..

[7]  Chunheng Wang,et al.  MRF based text binarization in complex images using stroke feature , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8]  Awais Ahmad,et al.  Urban planning and building smart cities based on the Internet of Things using Big Data analytics , 2016, Comput. Networks.

[9]  Jorge Stolfi,et al.  SnooperText: A text detection system for automatic indexing of urban scenes , 2014, Comput. Vis. Image Underst..

[10]  Palaiahnakote Shivakumara,et al.  Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images , 2015, IEEE Transactions on Image Processing.

[11]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[12]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[14]  Kai Wang,et al.  Video text detection and recognition: Dataset and benchmark , 2014, IEEE Winter Conference on Applications of Computer Vision.

[15]  Tatiana Novikova,et al.  Image Binarization for End-to-End Text Understanding in Natural Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[16]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shijian Lu,et al.  Accurate recognition of words in scenes without character segmentation using recurrent neural network , 2017, Pattern Recognit..

[18]  G. Hemantha Kumar,et al.  New Sharpness Features for Image Type Classification Based on Textual Information , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[19]  Palaiahnakote Shivakumara,et al.  Separation of Graphics (Superimposed) and Scene Text in Video Frames , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[20]  Vinod Pankajakshan,et al.  Image Overlay Text Detection Based on JPEG Truncation Error Analysis , 2016, IEEE Signal Processing Letters.