Character string extraction from color documents

Abstract A new algorithm for the extraction of character strings from color documents is proposed. We first divide a full color image into several representative binary color images. Then, character strings are nominated from each binary image by using multi-stage relaxation. However, the nominated strings are not always characters. They may be a part of the background, concatenated holes of characters, or dotted lines, etc. Therefore, when all nominated strings of all binary images are superimposed, some strings overlap each other. So, we selected the appropriate strings from them using the likelihood of a character string and two kinds of conflict resolution. In the experiments, we used color images like magazine covers, posters, etc. After applying color segmentation and the multi-stage relaxation, many character strings were nominated. Next, some adequate strings were selected. Finally, we show the experimental results and discuss some problems of extracting character strings from a color document.

[1]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[4]  Rui Zhang,et al.  Recognition of character strings from color urban map images on the basis of validation mechanism , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[6]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yuan Yan Tang,et al.  Automatic document processing: A survey , 1996, Pattern Recognit..

[9]  Hiroshi Maruyama,et al.  Character string extraction by multi-stage relaxation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[10]  Naoki Tanaka,et al.  The Extraction of Characters from Scene Image Using Mathematical Morphology , 1996, MVA.