An efficient hybrid scheme for key frame extraction and text localization in video

Efficient algorithms for caption text and scene text detection in video sequences are highly in-demand in the area of multimedia indexing and data retrieval. Due to challenges like, low resolution, low contrast, complex background and texts with multiple orientation/style/color/alignment, scene text extraction from video images is undoubtedly more challenging task. In this paper, a method has been proposed to efficiently extract the key frames from the videos based on color moments and then text localization is done only on the key frames. Since the text information does not change with each frame, text extraction is performed only on key frames which help in reducing the computational/processing time of the algorithm. Further, this paper proposes a hybrid robust method to localize scene and graphic text in the video frames using 2-D haar discrete wavelet transform (DWT), Laplacian of Gaussian filter and maximum gradient difference method. DWT provides a fast decomposition of the images into an approximate and three detail components. The three detail components contain the information about the vertical, horizontal and diagonal edges of the image which are used to easily differentiate texts from image. Maximum gradient difference method is used to further refine the text localization process and the gradient difference magnitude is used in the thresholding process. A dynamic thresholding technique has been used to convert the images into binary form. Since this thresholding technique obtains different threshold values for different images, it can be used for automatic text localization in video sequences. Two mask operators has been employed to obtain an equation which when applied on each pixel provides the intended threshold value. False positives are eliminated using morphological operations and connected component analysis is done to finally localize the text. The comparison metrics in the results show that the proposed method gives a good performance of detection rate, false alarm rate and misdetection rate.

[1]  Po-Yueh Chen,et al.  DWT Based Text Localization , 2004 .

[2]  Palaiahnakote Shivakumara,et al.  Detection of Curved Text in Video: Quad Tree Based Method , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  B. H. Shekar,et al.  An Efficient and Accurate Shot Boundary Detection Technique Based on Colour Moments , 2011 .

[4]  B. H. Shekar,et al.  Discrete Wavelet Transform and Gradient Difference Based Approach for Text Localization in Videos , 2014, 2014 Fifth International Conference on Signal and Image Processing.

[5]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[6]  Palaiahnakote Shivakumara,et al.  2009 10th International Conference on Document Analysis and Recognition A Gradient Difference based Technique for Video Text Detection , 2022 .

[7]  Chang Hong Lin,et al.  A robust video text detection approach using SVM , 2012, Expert Syst. Appl..

[8]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[9]  Edward K. Wong,et al.  A new robust algorithm for video text extraction , 2003, Pattern Recognit..

[10]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .

[12]  Palaiahnakote Shivakumara,et al.  A Laplacian Method for Video Text Detection , 2000, 2009 10th International Conference on Document Analysis and Recognition.