From Text Detection in Videos to Person Identification

We present in this article a video OCR system that detects and recognizes overlaid texts in video as well as its application to person identification in video documents. We proceed in several steps. First, text detection and temporal tracking are performed. After adaptation of images to a standard OCR system, a final post-processing combines multiple transcriptions of the same text box. The semi-supervised adaptation of this system to a particular video type (video broadcast from a French TV) is proposed and evaluated. The system is efficient as it runs 3 times faster than real time (including the OCR step) on a desktop Linux box. Both text detection and recognition are evaluated individually and through a person recognition task where it is shown that the combination of OCR and audio (speaker) information can greatly improve the performances of a state of the art audio based person identification system.

[1]  Ricky Houghton Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[2]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[3]  Claude Barras,et al.  On the use of GSV-SVM for Speaker Diarization and Tracking , 2010, Odyssey.

[4]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[5]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[6]  Qifeng Liu,et al.  A stroke filter and its application to text localization , 2009, Pattern Recognit. Lett..

[7]  Michael R. Lyu,et al.  A new approach for video text detection , 2002, Proceedings. International Conference on Image Processing.

[8]  Alexander G. Hauptmann,et al.  Searching for a specific person in broadcast news video , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Edward J. Delp,et al.  The indexing of persons in news sequences using audio-visual data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Xian-Sheng Hua,et al.  A video text detection and recognition system , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[11]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[12]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[13]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[14]  Ioannis Pratikakis,et al.  DIBCO 2009: document image binarization contest , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[15]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Tanzeem Choudhury,et al.  Multimodal person recognition using unconstrained audio and video , 1998 .

[17]  Qingming Huang,et al.  A New Text Detection Algorithm in Images/Video Frames , 2004, PCM.

[18]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[19]  Rolf Ingold,et al.  A HMM-Based Approach to Recognize Ultra Low Resolution Anti-Aliased Words , 2007, PReMI.