Salient Features and Hypothesis Testing: evaluating a novel approach for segmentation and address block location

This paper presents a modification with further experiments of a segmentation algorithm based on feature selection in wavelet space of ours [9]. The aim is to automatically separate in postal envelopes the regions related to background, stamps, rubber stamps, and the address blocks. First, a typical image of a postal envelope is decomposed using Mallat algorithm and Haar basis. High frequency channel outputs are analyzed to locate salient points in order to separate the background. A statistical hypothesis test is taken to decide upon more consistent regions in order to clean out some noise left. The selected points are projected back to the original gray level image, where the evidence from the wavelet space is used to start a growing process to include the pixels more likely to belong to the regions of stamps, rubber stamps, and written area. We have modified the growing process controlled by the salient points and the results were greatly improved reaching success rate of over 97%. Experiments are run using original postal envelopes from the Brazilian Post Office Agency, and here we report results on 440 images with many different layouts and backgrounds.

[1]  Rama Chellappa,et al.  Page segmentation using decision integration and wavelet packets , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[2]  Yuan Yan Tang,et al.  Text area localization under complex-background using wavelet decomposition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Victor K. Y. Wu Automatic Text Detection and Recognition , 1997 .

[5]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[6]  Richard G. Baraniuk,et al.  Multiscale image segmentation using wavelet-domain hidden Markov models , 2001, IEEE Trans. Image Process..

[7]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[9]  David Menotti,et al.  Segmentation of postal envelopes for address block location:an approach based on feature selection in wavelet space , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  William H. Press,et al.  Numerical recipes in C , 2002 .

[11]  Díbio Leandro Borges,et al.  Analysis of mammogram classification using a wavelet transform decomposition , 2003, Pattern Recognit. Lett..

[12]  Anil K. Jain,et al.  Address block location on envelopes using Gabor filters , 1992, Pattern Recognit..

[13]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[14]  Anil K. Jain,et al.  Address block location on complex mail pieces , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.