Document image segmentation as selection of Voronoi edges

This paper presents a method of document image segmentation for pages with document components of arbitrary shape as well as any skew angles. The characteristics of the proposed method are as follows: the Voronoi diagram is constructed based on the connected components to obtain the candidates of boundaries of document components; and the candidates are utilized to estimate the inter-character, inter-line and inter-column gaps without the use of domain specific parameters so as to select the boundaries. From the experimental results for 80 images with non-Manhattan layout and the skews of 0/spl deg//spl sim/450. We have confirmed that the method is effective for extraction of column regions and as efficient as other methods based on connected component analysis.

[1]  Tim Ritchings,et al.  Flexible page segmentation using the background , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[2]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[3]  Mark J. Burge,et al.  Using the Voronoi tessellation for grouping words and multipart symbols in documents , 1995, Optics & Photonics.

[4]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  C. Viard-Gaudin,et al.  A background based adaptive page segmentation algorithm , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[7]  Koichi Kise,et al.  Page segmentation based on thinning of background , 1996, Proceedings of 13th International Conference on Pattern Recognition.