Wavelet Based Page Segmentation

The process of page segmentation produces a description of the spatial extent and position of various components on the document page. In this paper, we present an approach for segmentation of a general document page image using wavelets. This method uses orthonormal wavelet decomposition to extract the attributes of the document spread over di erent scales. We have devised a scheme for the parameterisation of the font-size of text and also for distinguishing between text and non-text regions in the document. Based on these, a segmentation algorithm has been implemented and evaluated it through extensive testing.

[1]  Chew Lim Tan,et al.  Text/graphics separation using agent-based pyramid operations , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[2]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[4]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[5]  Shamik Sural,et al.  A two-step algorithm and its parallelization for the generation of minimum containing rectangles for document image segmentation , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  George Nagy,et al.  Characteristics of digitized images of technical articles , 1992, Electronic Imaging.

[7]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[8]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[9]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Tim Ritchings,et al.  Flexible page segmentation using the background , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[11]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Bin Yu,et al.  Page segmentation using document model , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.