Word Grouping in Document Images Based on Voronoi Tessellation

Voronoi tessellation of image elements provides an intuitive and appealing definition of proximity, which has been suggested as an effective tool for the description of relations among the neighboring objects in a digital image. In this paper, a Voronoi tessellation based method is presented for word grouping in document images. The Voronoi neighborhoods are generated from the Voronoi tessellation, with the information about the relations and distances of neighboring connected components, based on which word grouping is carried out. The proposed method has been evaluated on a variety of document images. The experimental results show that it has achieved promising results with a high accuracy, and is robust to various font types, styles, sizes, skew angles, as well as different text orientations.

[1]  Hong Yan,et al.  Text region extraction in a document image based on the Delaunay tessellation , 2003, Pattern Recognit..

[2]  Ching Y. Suen,et al.  Word segmentation of printed text lines based on gap clustering and special symbol detection , 2002, Object recognition supported by user interaction for service robots.

[3]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, Second Edition , 2000, Wiley Series in Probability and Mathematical Statistics.

[4]  Chew Lim Tan,et al.  Text extraction using pyramid , 1998, Pattern Recognit..

[5]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[6]  Andreas Dengel,et al.  Text-Line Extraction as Selection of Paths in the Neighbor Graph , 1998, Document Analysis Systems.

[7]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[8]  Robert M. Haralick,et al.  Using Area Voronoi Tessellation to Segment Characters Connected to Graphics , 2001 .

[9]  Horst Bunke,et al.  Text extraction from colored book and journal covers , 2000, International Journal on Document Analysis and Recognition.

[10]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Henry S. Baird,et al.  Language-free layout analysis , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Hwan-Gue Cho,et al.  A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model , 2001, International Journal on Document Analysis and Recognition.

[13]  Yalin Wang,et al.  Statistical-based approach to word segmentation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  Mark J. Burge,et al.  Using the Voronoi tessellation for grouping words and multipart symbols in documents , 1995, Optics & Photonics.