A Dual PSO-Adaptive Mean Shift for Preprocessing Optimization on Degraded Document Images

In the two past decades, solving complex search and optimization problems with bioinspired metaheuristic algorithms has received considerable attention among researchers. In this paper, the image preprocessing is considered as an optimization problem and the PSO Particle Swarm Optimization algorithm was been chosen to solve it in order to select the best parameters. The document image preprocessing step is the basis of all other steps in OCR Optical Character Recognition system, such as binarization, segmentation, skew correction, layout extraction, textual zones detection and OCR. Without preprocessing, the presence of degradation in the image significantly reduces the performance of these steps. The authors' contribution focuses on the preprocessing of type: smoothing and filtering document images using a new Adaptive Mean Shift algorithm based on the integral image. The local adaptation to the image quality accelerates the conventional smoothing avoiding the preprocessing of homogeneous zones. The authors' goal is to show how PSO algorithm can improve the results quality and the choice of parameters in pre-processing's methods of document images. Comparative studies as well as tests over the existing dataset have been reported to confirm the efficiency of the proposed approach.

[1]  Peng-Yeng Yin Skew detection and block classification of printed documents , 2001, Image Vis. Comput..

[2]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Véronique Eglin,et al.  Improvement of postal mail sorting system , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Nikos Papamarkos,et al.  Color reduction for complex document images , 2009 .

[5]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Mahrokh G. Shayesteh,et al.  Efficient contrast enhancement of images using hybrid ant colony optimisation, genetic algorithm, and simulated annealing , 2013, Digit. Signal Process..

[7]  Giovanni Ramponi,et al.  Enhancing document images with a quadratic filter , 1993, Signal Process..

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Yasser Alginahi,et al.  Preprocessing Techniques in Character Recognition , 2010 .

[10]  Syed Saqib Bukhari,et al.  Document image segmentation using discriminative learning over connected components , 2010, DAS '10.

[11]  Matti Pietikäinen,et al.  Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[12]  Mohammed Yagouni Using of metaheuristics for optimizing satellite image registration , 2009, 2009 International Conference on Computers & Industrial Engineering.

[13]  Hélène Laurent,et al.  Unsupervised and Supervised Image Segmentation Evaluation , 2006 .

[14]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[15]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[16]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[17]  Frank Lebourgeois,et al.  Denoising Textual Images Using Local/Non-local Smoothing Filters: A Comparative Study , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[18]  Szymon Grabowski,et al.  Preprocessing for Real-Time Handwritten Character Recognition , 2008, Computer Recognition Systems 2.

[19]  Ashish Ghosh,et al.  Gray-level Image Enhancement By Particle Swarm Optimization , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[20]  Driss Mammass,et al.  A Document Image Segmentation System Using Analysis of Connected Components , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[21]  David Menotti,et al.  Multi-Histogram Equalization Methods for Contrast Enhancement and Brightness Preserving , 2007, IEEE Transactions on Consumer Electronics.

[22]  Frank Lebourgeois,et al.  A new PDE-based approach for singularity-preserving regularization: application to degraded characters restoration , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[23]  Frank Y. Shih,et al.  Adaptive document block segmentation and classification , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[24]  Hui Zhang,et al.  Image segmentation evaluation: A survey of unsupervised methods , 2008, Comput. Vis. Image Underst..