Comprehensive color segmentation system for noisy digitized documents to enhance text extraction

This paper presents a novel, general purpose and multi-applications color segmentation system providing optimal chromatic and achromatic layers and filtering the hue and illumination distortions, with minimal information loss. A text extraction method based on the resulting segmentation is proposed to illustrate the usefulness of the method. The system is validated through the evaluation of a well-known commercial OCR line segmentation performances on the processed images.

[1]  Ching Y. Suen,et al.  Color segmentation for text extraction , 2003, Document Analysis and Recognition.

[2]  Frank Lebourgeois,et al.  Chromatic / Achromatic Separation in Noisy Document Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Nikos A. Nikolaou,et al.  Color reduction for complex document images , 2009, Int. J. Imaging Syst. Technol..

[4]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[5]  Seong-Whan Lee,et al.  Text extraction in MPEG compressed video for content-based indexing , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Frank Lebourgeois,et al.  Advertisement detection in digitized press images , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[7]  A. G. Ramakrishnan,et al.  Text Localization and Extraction from Complex Color Images , 2005, ISVC.

[8]  Bing-Fei Wu,et al.  A multi-plane approach for text segmentation of complex document images , 2009, Pattern Recognit..

[9]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[10]  Salvatore Tabbone,et al.  Text extraction from graphical document images using sparse representation , 2010, DAS '10.

[11]  Arthur Robert Weeks,et al.  Color segmentation in the HSI color space using the K-means algorithm , 1997, Electronic Imaging.

[12]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[13]  Nikolaos G. Bourbakis,et al.  A fuzzy region growing approach for segmentation of color images , 1997, Pattern Recognit..

[14]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[15]  P. Nagabhushan,et al.  Foreground Text Extraction in Color Document Images for Enhanced Readability , 2009, PReMI.

[16]  Yann LeCun,et al.  A general segmentation scheme for DjVu document compression , 2002 .

[17]  Hua Yang,et al.  Extraction of Bibliography Information Based on Image of Book Cover , 2000, Int. J. Pattern Recognit. Artif. Intell..

[18]  Charalambos Strouthopoulos,et al.  Text extraction in complex color documents , 2002, Pattern Recognit..

[19]  P. Nagabhushan,et al.  Text Extraction in Complex Color Document Images for Enhanced Readability , 2010, Intell. Inf. Manag..

[20]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[21]  Frank Lebourgeois,et al.  Document analysis in gray level and typography extraction using character pattern redundancies , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).