Text/Graphics Separation Revisited

Text/graphics separation aims at segmenting the document into two layers: a layer assumed to contain text and a layer containing graphical objects. In this paper, we present a consolidation of a method proposed by Fletcher and Kasturi, with a number of improvements to make it more suitable for graphics-rich documents. We discuss the right choice of thresholds for this method, and their stability. We also propose a post-processing step for retrieving text components touching the graphics, through local segmentation of the distance skeleton.

[1]  Rangachar Kasturi,et al.  Improved Directional Morphological Operations for Separation of Characters from Maps/Graphics , 1997, GREC.

[2]  Vishal Misra,et al.  Detection of Horizontal Lines in Noisy Run Length Encoded Images: The FAST Method , 1995, GREC.

[3]  J. M. Gloger,et al.  Use of the Hough transform to separate merged text/graphics in forms , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[4]  Hwan-Gue Cho,et al.  A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model , 2001, International Journal on Document Analysis and Recognition.

[5]  Robert M. Haralick,et al.  Using Area Voronoi Tessellation to Segment Characters Connected to Graphics , 2001 .

[6]  Christian Ah-Soon,et al.  A complete system for the analysis of architectural drawings , 2000, International Journal on Document Analysis and Recognition.

[7]  Karl Tombre,et al.  Graphics Recognition Algorithms and Systems , 1997, Lecture Notes in Computer Science.

[8]  Karl Tombre,et al.  Graphics Recognition Methods and Applications , 1995, Lecture Notes in Computer Science.

[9]  Chew Lim Tan,et al.  Text/Graphics Separation in Maps , 2001, GREC.

[10]  Francesca Cesarini,et al.  Automatic document classification and indexing in high-volume applications , 2001, International Journal on Document Analysis and Recognition.

[11]  Zhaoyang Lu,et al.  Detection of Text Regions From Digital Engineering Drawings , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Chew Lim Tan,et al.  Separation of overlapping text from graphics , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[15]  Dov Dori,et al.  Vector-Based Segmentation of Text Connected to Graphics in Engineering Drawings , 1996, SSPR.

[16]  Harry Wechsler,et al.  Classification of binary document images into textual or nontextual data blocks using network models , 1995 .

[17]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[18]  Jan J. Gerbrands,et al.  An alternative to vectorization: decomposition of graphics into primitives , 1994 .

[19]  Gabriella Sanniti di Baja Well-Shaped, Stable, and Reversible Skeletons from the (3, 4)-Distance Transform , 1994, J. Vis. Commun. Image Represent..

[20]  Toru Kaneko Line structure extraction from line-drawing images , 1992, Pattern Recognit..

[21]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .