Text From Corners: A Novel Approach to Detect Text and Caption in Videos

Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking.

[1]  Korris Fu-Lai Chung,et al.  Hybrid Chinese/English text detection in images and video frames , 2002, Object recognition supported by user interaction for service robots.

[2]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[4]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[5]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[6]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[7]  Sethuraman Panchanathan,et al.  Review of Image and Video Indexing Techniques , 1997, J. Vis. Commun. Image Represent..

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[10]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[12]  Wonjun Kim,et al.  A New Approach for Overlay Text Detection and Extraction From Complex Video Scene , 2009, IEEE Transactions on Image Processing.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[16]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[17]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[18]  Clement T. Yu,et al.  Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[19]  William Bright,et al.  The Blackwell encyclopedia of writing systems By Florian Coulmas (review) , 2015 .

[20]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[21]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[23]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[24]  Beom-Joon Cho,et al.  Locating characters in scene images using frequency features , 2002, Object recognition supported by user interaction for service robots.

[25]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[26]  Xinbo Gao,et al.  A spatial-temporal approach for video caption detection and recognition , 2002, IEEE Trans. Neural Networks.

[27]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[28]  M. Smith,et al.  Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[29]  Jagath Samarabandu,et al.  Multiscale Edge-Based Text Extraction from Complex Images , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[30]  Hao Yan,et al.  Automatic Text Detection In Video Frames Based on Bootstrap Artificial Neural Network and CED , 2003, WSCG.

[31]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).