TextFinder: An Automatic System to Detect and Recognize Text In Images

A robust system is proposed to automatically detect and extract text in images from different sources, including video, newspapers, advertisements, stock certificates, photographs, and checks. Text is first detected using multiscale texture segmentation and spatial cohesion constraints, then cleaned up and extracted using a histogram-based binarization algorithm. An automatic performance evaluation scheme is also proposed.

[1]  Sargur N. Srihari,et al.  Postal address block location in real time , 1992, Computer.

[2]  Ken Thompson,et al.  Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[4]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Victor Wu Document Image Clean-up and Binarization , 1998 .

[6]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[7]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[10]  Daniel P. Lopresti,et al.  Finding text in color images , 1998, Electronic Imaging.

[11]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[12]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[14]  A FletcherLloyd,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988 .

[15]  Mindy Bokser,et al.  Omnidocument technologies , 1992, Proc. IEEE.

[16]  R. Manmatha,et al.  Document image cleanup and binarization , 1998, Electronic Imaging.

[17]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[19]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[20]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[21]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[22]  Chris A. Glasbey,et al.  An Analysis of Histogram-Based Thresholding Algorithms , 1993, CVGIP Graph. Model. Image Process..

[23]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Daniel P. Lopresti,et al.  Extracting text from WWW images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[25]  KamelMohamed,et al.  Extraction of binary character/graphics images from grayscale document images , 1993 .

[26]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[27]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[28]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[29]  R. Nevatia A Color Edge Detector and Its Use in Scene Segmentation , 1977 .

[30]  Lawrence O'Gorman Binarization and Multithresholding of Document Images Using Connectivity , 1994, CVGIP Graph. Model. Image Process..