论文信息 - Finding text in images

Finding text in images

There are many applications in which the automatic detection and recognition of text embedded in images is useful. These applications include digad libraries, multimedia systems, and Geographical Information Systems. When machine generated text is prdnted against clean backgrounds, it can be converted to a computer readable form (ASCII) using current Optical Character Recognition (OCR) technology. However, text is often printed against shaded or textured backgrounds or is embedded in images. Examples include maps, advertisements, photographs, videos and stock certificates. Current document segmentation and recognition technologies cannot handle these situafons well. In this paper, a four-step system which automaticnlly detects and extracts text in images i& proposed. First, a texture segmentation scheme is used to focus attention on regions where text may occur. Second, strokes are extracted from the segmented text regions. Using reasonable heuristics on text strings such as height similarity, spacing and alignment, the extracted strokes are then processed to form rectangular boxes surrounding the corresponding ttzt strings. To detect text over a wide range of font sizes, the above steps are first applied to a pyramid of images generated from the input image, and then the boxes formed at each resolution level of the pyramid are fused at the image in the original resolution level. Third, text is extracted by cleaning up the background and binarizing the detected ted strings. Finally, better text bounding boxes are generated by srsiny the binarized text as strokes. Text is then cleaned and binarized from these new boxes, and can then be passed through a commercial OCR engine for recognition if the text is of an OCR-recognizable font. The system is stable, robust, and works well on imayes (with or without structured layouts) from a wide van’ety of sources, including digitized video frames, photographs, *This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC9209623, in part by the United States Patent and mademark Office and Defense Advanced Research Projects Agency/IT0 under ARPA order number D468, issued by ESC/AXS contract number F19628-96-C-0235, in part by the National Science Foundation under grant number IF&9619117 and in part by NSF Multimedia CDA-9502639. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsors. Prrmission to make digital/hard copies ofall or part oflhis material for personal or clrrssroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title ofthe publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or fe DL 97 Philadelphia PA, USA Copyright 1997 AChi 0-89791~868-1197/7..$3.50 newspapers, advertisements, stock certifimtes, and personal checks. All parameters remain the same for-all the experiments.

Edward M. Riseman | R. Manmatha | Victor Wu

[1] Friedrich M. Wahl,et al. Document Analysis System , 1982, IBM J. Res. Dev..

[2] Lawrence O'Gorman. Binarization and Multithresholding of Document Images Using Connectivity , 1994, CVGIP Graph. Model. Image Process..

[3] Rangachar Kasturi,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Mahesh Viswanathan,et al. A prototype document image analysis system for technical journals , 1992, Computer.

[5] Ching Y. Suen,et al. Historical review of OCR research and development , 1992, Proc. IEEE.

[6] Lawrence O'Gorman,et al. The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7] P Perona,et al. Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[8] Chris A. Glasbey,et al. An Analysis of Histogram-Based Thresholding Algorithms , 1993, CVGIP Graph. Model. Image Process..

[9] Sargur N. Srihari,et al. Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[10] Øivind Due Trier,et al. Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11] Jiangying Zhou,et al. Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[12] Mohamed S. Kamel,et al. Extraction of Binary Character/Graphics Images from Grayscale Document Images , 1993, CVGIP Graph. Model. Image Process..

[13] A FletcherLloyd,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988 .

[14] Sargur N. Srihari,et al. Postal address block location in real time , 1992, Computer.

[15] Ken Thompson,et al. Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[16] Friedrich M. Wahl,et al. Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..