Segmentation of envelopes and address block location by salient features and hypothesis testing

Although nowadays there are working systems for sorting mail in some constrained ways, segmenting gray level images of envelopes and locating address blocks in them is still a difficult problem. Pattern Recognition research has contributed greatly to this area since the problem concerns feature design, extraction, recognition, and also the image segmentation if one deals with the original gray level images from the beginning. This paper presents a segmentation and address block location algorithm based on feature selection in wavelet space. The aim is to automatically separate in postal envelopes the regions related to background, stamps, rubber stamps, and the address blocks. First, a typical image of a postal envelope is decomposed using Mallat algorithm and Haar basis. High frequency channel outputs are analyzed to locate salient points in order to separate the background. A statistical hypothesis test is taken to decide upon more consistent regions in order to clean out some noise left. The selected points are projected back to the original gray level image, where the evidence from the wavelet space is used to start a growing process to include the pixels more likely to belong to the regions of stamps, rubber stamps, and written area. Besides the new features and a growing process controlled by the salient points presented here, a fully comprehensive experimental setup was run by separating and classifying blocks in the envelopes, and validating results by a pixel to pixel accuracy measure using a ground truth database of 2200 images with different layouts and backgrounds. Success rate for address block location reached is over 90%.

[1]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Rama Chellappa,et al.  Page segmentation using decision integration and wavelet packets , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[3]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[4]  Graham Leedham,et al.  Preprocessing and presorting of envelope images for automatic sorting using OCR , 1990, Pattern Recognit..

[5]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[6]  Yuan Yan Tang,et al.  Text area localization under complex-background using wavelet decomposition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Sargur N. Srihari,et al.  Interpretation of handwritten addresses in US mailstream , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[9]  Joseph Wilder A review of: “Image Processing and Data Analysis, the Multiscale Approach” J-L Starck, F. Murtagh and A. Bijaoui Cambridge University Press, ISBN 0-521-59084-1, $80.00 , 1999 .

[10]  David Menotti,et al.  Segmentation of postal envelopes for address block location:an approach based on feature selection in wavelet space , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Venu Govindaraju,et al.  Postal address block location by contour clustering , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Anil K. Jain,et al.  Address block location on envelopes using Gabor filters , 1992, Pattern Recognit..

[13]  Venu Govindaraju,et al.  Information theoretic analysis of postal address fields for automatic address interpretation , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[14]  Anil K. Jain,et al.  Address block location on complex mail pieces , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[15]  Díbio Leandro Borges,et al.  Analysis of mammogram classification using a wavelet transform decomposition , 2003, Pattern Recognit. Lett..

[16]  Sargur N. Srihari,et al.  Postal address block location in real time , 1992, Computer.

[17]  Victor K. Y. Wu Automatic Text Detection and Recognition , 1997 .

[18]  William H. Press,et al.  Numerical recipes in C , 2002 .

[19]  Heinrich Niemann,et al.  Fast address block location on handwritten and machine printed mail-piece images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[20]  Sargur N. Srihari,et al.  Performance of a System to Locate Address Blocks on Mail Pieces , 1988, AAAI.

[21]  Azriel Rosenfeld,et al.  Address location on envelopes , 1987, Pattern Recognit..

[22]  Richard G. Baraniuk,et al.  Multiscale image segmentation using wavelet-domain hidden Markov models , 2001, IEEE Trans. Image Process..