Color reduction for complex document images

A new technique for color reduction of complex document images is presented in this article. It reduces significantly the number of colors of the document image (less than 15 colors in most of the cases) so as to have solid characters and uniform local backgrounds. Therefore, this technique can be used as a preprocessing step by text information extraction applications. Specifically, using the edge map of the document image, a representative set of samples is chosen that constructs a 3D color histogram. Based on these samples in the 3D color space, a relatively large number of colors (usually no more than 100 colors) are obtained by using a simple clustering procedure. The final colors are obtained by applying a mean-shift based procedure. Also, an edge preserving smoothing filter is used as a preprocessing stage that enhances significantly the quality of the initial image. Experimental results prove the method's capability of producing correctly segmented complex color documents where the character elements can be easily extracted as connected components. © 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 14–26, 2009

[1]  Nikos Papamarkos,et al.  Color reduction using local features and a kohonen self‐organized feature map neural network , 1999, Int. J. Imaging Syst. Technol..

[2]  K. Plataniotis,et al.  Color Image Processing and Applications , 2000 .

[3]  Ching Y. Suen,et al.  Character string extraction from color documents , 2001, Pattern Recognit..

[4]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[5]  Chaur-Heh Hsieh,et al.  Sample-size adaptive self-organization map for color images quantization , 2007, Pattern Recognit. Lett..

[6]  Jos B. T. M. Roerdink,et al.  The Watershed Transform: Definitions, Algorithms and Parallelization Strategies , 2000, Fundam. Informaticae.

[7]  Xin Zhang,et al.  Color Quantization of Digital Images , 2005, PCM.

[8]  Larry S. Davis,et al.  A new class of edge-preserving smoothing filters , 1987, Pattern Recognit. Lett..

[9]  Steven A. Shafer,et al.  Color vision , 1992 .

[10]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  P. Prusinkiewicz,et al.  Variance‐based color image quantization for frame buffer display , 1990 .

[12]  Dah-Jye Lee,et al.  Color Image Quantization Using Color Variation Measure , 2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing.

[13]  Ehsanollah Kabir,et al.  Color reduction based on ant colony , 2007, Pattern Recognit. Lett..

[14]  Surapong Auwatanamongkol,et al.  Color image quantization using distances between adjacent colors along the color axis with highest color variance , 2004, Pattern Recognit. Lett..

[15]  Shyi-Chyi Cheng,et al.  A fast and novel technique for color quantization using reduction of color space dimensionality , 2001, Pattern Recognit. Lett..

[16]  Sang Uk Lee,et al.  On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques , 1990, Pattern Recognit..

[17]  Ching Y. Suen,et al.  Color segmentation for text extraction , 2003, Document Analysis and Recognition.

[18]  Shu-Yuan Chen,et al.  Adaptive page segmentation for color technical journals' cover images , 1998, Image Vis. Comput..

[19]  Charalambos Strouthopoulos,et al.  Text extraction in complex color documents , 2002, Pattern Recognit..

[20]  Jiri Matas,et al.  Spatial and Feature Space Clustering: Applications in Image Analysis , 1995, CAIP.

[21]  Paul Scheunders,et al.  A comparison of clustering algorithms applied to color image quantization , 1997, Pattern Recognit. Lett..

[22]  Nenghai Yu,et al.  Adaptive color quantization based on perceptive edge protection , 2003, Pattern Recognit. Lett..

[23]  Paul S. Heckbert Color image quantization for frame buffer display , 1982, SIGGRAPH.

[24]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Michael Gervautz,et al.  A simple method for color quantization: octree quantization , 1990 .

[26]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[27]  Lale Akarun,et al.  A fuzzy algorithm for color quantization of images , 2002, Pattern Recognit..

[28]  Anthony H. Dekker,et al.  Kohonen neural networks for optimal colour quantization , 1994 .

[29]  Charalambos Strouthopoulos,et al.  Adaptive color reduction , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[30]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Bin Wang,et al.  Color text image binarization based on binary texture analysis , 2005, Pattern Recognit. Lett..

[33]  Xiaolin Wu,et al.  Color quantization by dynamic programming and principal analysis , 1992, TOGS.

[34]  Zhou Bing,et al.  An adjustable algorithm for color quantization , 2004, Pattern Recognit. Lett..

[35]  Chip-Hong Chang,et al.  New adaptive color quantization method based on self-organizing maps , 2005, IEEE Transactions on Neural Networks.

[36]  Horst Bunke,et al.  Text extraction from colored book and journal covers , 2000, International Journal on Document Analysis and Recognition.