Location of title and author regions in document images based on the Delaunay triangulation

Abstract Automatic title and author location can be a crucial step in journal document image processing systems. This paper presents a Delaunay triangulation-based method for identification of title and author areas in a technical document image. The positions and alignments of small text line regions are measured by different triangle groups and the character stroke widths are calculated from the constrained Delaunay triangulation. The rules defining spatial features and font attributes of the title and author region are applied to single line text regions to extract the title and author regions. Our experiment results show that the proposed method is effective.

[1]  Thomas Kieninger,et al.  Rule-based document structure understanding with a fuzzy combination of layout and textual features , 2001, International Journal on Document Analysis and Recognition.

[2]  Hong Yan,et al.  Document page segmentation based on pattern spread analysis , 2000 .

[3]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yasuto Ishitani Logical Structure Analysis of Document Images Based on Emergent Computation , 2005, IEICE Trans. Inf. Syst..

[5]  Phillip E. Mitchell,et al.  Document layout extraction using soft ordering , 2002 .

[6]  Shigehiro Fukushima Division-Based Analysis of Symmetry and Its Application , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[8]  Hong Yan,et al.  Text region extraction in a document image based on the Delaunay tessellation , 2003, Pattern Recognit..

[9]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[10]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[11]  Daniel X. Le,et al.  Automated labeling in document images , 2000, IS&T/SPIE Electronic Imaging.

[12]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[13]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[15]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[16]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  L. Paul Chew,et al.  Constrained Delaunay triangulations , 1987, SCG '87.

[18]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[19]  Henry S. Baird,et al.  Image segmentation by shape-directed covers , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[20]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[22]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[23]  Kristen Maria Summers Near-wordless document structure classification , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.