Sharpness and Contrast Based Features for Word-Wise Video Type Classification

Word recognition by a single algorithm from different image types, namely, video-scene, video-caption, natural-scene, mobile camera, Born digital images, etc., is very difficult due to different levels of difficulties. This paper presents a new method combining sharpness and contrast features for classifying different image types at word level using Saturation (S) and Intensity (I) spaces of HSI. For input images, the proposed method performs one dimension filter to smooth each image. It then proposes to perform Maximum Value Difference (MVD) operation to sharpen edge details for the smoothed image. Next, clustering is proposed on enhanced images to identify text candidates. The proposed method extracts sharpness and contrast features in a new way for text candidate images in S and I spaces. K-means clustering is further employed on the extracted set of sharpness and contrast features to obtain different clusters for each space, which results in a feature vector. The feature vector is then fed to an SVM classifier for classification. We use standard datasets, namely, ICDAR 2013, ICDAR 2015 video, natural scene data, caption texts, Born digital data and the images captured by a mobile camera (our own data) to evaluate the performance of the proposed method. Comparative study on classification experiments shows that the proposed method outperforms the existing methods. Recognition experiments before and after classification show that proposed scheme is useful and effective.

[1]  Shijian Lu,et al.  Robust text segmentation using graph cut , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[3]  Awais Ahmad,et al.  Urban planning and building smart cities based on the Internet of Things using Big Data analytics , 2016, Comput. Networks.

[4]  G. Hemantha Kumar,et al.  New Sharpness Features for Image Type Classification Based on Textual Information , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[5]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[9]  Chew Lim Tan,et al.  Bayesian classifier for multi-oriented video text recognition system , 2015, Expert Syst. Appl..

[10]  Earl E. Gose,et al.  Pattern Recognition and Image Analysis , 2011, Lecture Notes in Computer Science.

[11]  Yingli Tian,et al.  Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration , 2014, IEEE Transactions on Image Processing.

[12]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Kai Chen,et al.  Efficient text localization in born-digital images by local contrast-based segmentation , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[14]  Partha Pratim Roy,et al.  Multi-lingual text recognition from video frames , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[15]  Guoyou Wang,et al.  Detecting natural scenes text via auto image partition, two-stage grouping and two-layer classification , 2015, Pattern Recognit. Lett..

[16]  Weihong Deng,et al.  Recurrent convolutional neural network for video classification , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[17]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[18]  Chunheng Wang,et al.  MRF based text binarization in complex images using stroke feature , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[19]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  Matti Pietikäinen,et al.  Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[21]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[22]  David S. Doermann,et al.  Sharpness estimation for document and scene images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Palaiahnakote Shivakumara,et al.  Video scene text frames categorization for text detection and recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[24]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.