Word Extraction Using Area Voronoi Diagram

A method of word extraction based on the area Voronoi diagram is presented in this paper. Firstly, connected components are generated from the input image. Secondly, noise removal is performed including a special symbol detection technique to find some types of special symbols lying between words. Thirdly, base on the area Voronoi diagram, we select appropriate Voronoi edges which separate two neighboring connected components. Finally, words are extracted by merging the connected components based on the Voronoi edge between them. The result generated by this method is satisfactory with the ability to correctly group words of different size, font and arrangement. Experiments show that the proposed method achieves a high accuracy.

[1]  Horst Bunke,et al.  Text extraction from colored book and journal covers , 2000, International Journal on Document Analysis and Recognition.

[2]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Chew Lim Tan,et al.  Text extraction using pyramid , 1998, Pattern Recognit..

[4]  Hirotomo Aso,et al.  Extracting curved text lines using local linearity of the text line , 1999, International Journal on Document Analysis and Recognition.

[5]  Hwan-Gue Cho,et al.  A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model , 2001, International Journal on Document Analysis and Recognition.

[6]  Robert M. Haralick,et al.  Using Area Voronoi Tessellation to Segment Characters Connected to Graphics , 2001 .

[7]  Ching Y. Suen,et al.  Word segmentation of printed text lines based on gap clustering and special symbol detection , 2002, Object recognition supported by user interaction for service robots.

[8]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[9]  Hong Yan,et al.  Text region extraction in a document image based on the Delaunay tessellation , 2003, Pattern Recognit..

[10]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..