Automatic text detection and tracking in digital video

Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD)-based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

[1]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[2]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[3]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[4]  M. Nadler,et al.  The telesign project , 1985, Proceedings of the IEEE.

[5]  J. P. Jones,et al.  The two-dimensional spatial structure of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[6]  T. Gotoh,et al.  A Flexible Vision-Based Algorithm for a Book Sorting System , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Nanning Zheng,et al.  Automatic recognition of province name on the license plate of moving vehicle , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[8]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[9]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  S. Mallat Multiresolution approximations and wavelet orthonormal bases of L^2(R) , 1989 .

[11]  R.M.K. Sinha,et al.  Integrating word level knowledge in text recognition , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[12]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[13]  Isabelle Guyon Applications of Neural Networks to Character Recognition , 1991, Int. J. Pattern Recognit. Artif. Intell..

[14]  James M. Rehg,et al.  Visual tracking with deformation models , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[15]  Kenji Kurosu,et al.  Neural network vowel-recognition jointly using voice features and mouth shape image , 1991, Pattern Recognit..

[16]  Rama Chellappa,et al.  Texture segmentation with neural networks , 1992 .

[17]  Rama Chellappa,et al.  A unified approach to boundary perception: edges, textures, and illusory contours , 1993, IEEE Trans. Neural Networks.

[18]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Marc Davis,et al.  Media streams: representing video for retrieval and repurposing , 1994, MULTIMEDIA '94.

[20]  Bobby R. Hunt,et al.  Image processing and neural networks for recognition of cartographic area features , 1994, Pattern Recognit..

[21]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[22]  Marc Davis Media streams (demonstration): representing video for retrieval and repurposing , 1994, MULTIMEDIA '94.

[23]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[24]  Hang Joon Kim,et al.  Automatic recognition of a car license plate using color image processing , 1994, Proceedings of 1st International Conference on Image Processing.

[25]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Tze Fen Li,et al.  Handprinted Chinese Character Recognition using the Probability Distribution Feature , 1994, Document Image Analysis.

[27]  Atreyi Kankanhalli,et al.  Automatic Extraction of Characters in Complex Scene Images , 1995, Int. J. Pattern Recognit. Artif. Intell..

[28]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[30]  Mark E. Oxley,et al.  Neural networks for automatic target recognition , 1995, Neural Networks.

[31]  James S. Duncan,et al.  A model-based integrated approach to track myocardial deformation using displacement and velocity constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[32]  Laurent D. Cohen,et al.  Tracking Medical 3D Data with a Deformable Parametric Model , 1996, ECCV.

[33]  Marco Campani,et al.  Robust method for road sign detection and recognition , 1996, Image Vis. Comput..

[34]  Hans-Hellmut Nagel,et al.  Tracking of Occluded Vehicles in Traffic Scenes , 1996, ECCV.

[35]  Hang Joon Kim,et al.  A recognition of vehicle license plate using a genetic algorithm based segmentation , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[36]  David Doermann,et al.  Archiving, indexing, and retrieval of video in the compressed domain , 1996, Other Conferences.

[37]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[38]  John M. Gauch,et al.  Vision: a digital video library , 1996, DL '96.

[39]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[40]  Andrew Blake,et al.  Learning Dynamics of Complex Motions from Image Sequences , 1996, ECCV.

[41]  L. Davis,et al.  el-based tracking of humans in action: , 1996 .

[42]  Hae-Kwang Kim,et al.  Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database , 1996, J. Vis. Commun. Image Represent..

[43]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[44]  Rama Chellappa,et al.  Multiscale Document Page Segmentation Using Soft Decision Integration , 1997 .

[45]  Daniel P. Lopresti,et al.  Extracting text from WWW images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[46]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Victor K. Y. Wu Automatic Text Detection and Recognition , 1997 .

[48]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Paul W. Fieguth,et al.  Color-based tracking of heads and other mobile objects at video frame rates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Anil K. Jain,et al.  Address block location on complex mail pieces , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[52]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Daniel P. Lopresti,et al.  Finding text in color images , 1998, Electronic Imaging.

[54]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[55]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[56]  Jake K. Aggarwal,et al.  Nonrigid Motion Analysis: Articulated and Elastic Motion , 1998, Comput. Vis. Image Underst..

[57]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  David S. Doermann,et al.  Automatic identification of text in digital video key frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[59]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).