Two Geometric Algorithms for Layout Analysis

This paper presents geometric algorithms for solving two key problems in layout analysis: finding a cover of the background whitespace of a document in terms of maximal empty rectangles, and finding constrained maximum likelihood matches of geometric text line models in the presence of geometric obstacles. The algorithms are considerably easier to implement than prior methods, they return globally optimal solutions, and they require no heuristics. The paper also introduces an evaluation function that reliably identifies maximal empty rectangles corresponding to column boundaries. Combining this evaluation function with the two geometric algorithms results in an easy-to-implement layout analysis system. Reliability of the system is demonstrated on documents from the UW3 database.

[1]  Thomas M. Breuel,et al.  Fast recognition using adaptive subdivisions of transformation space , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[3]  Henry S. Baird Background Structure in Document Images , 1994, Int. J. Pattern Recognit. Artif. Intell..

[4]  William M. Wells,et al.  Statistical Approaches to Feature-Based Object Recognition , 2004, International Journal of Computer Vision.

[5]  Henry S. Baird,et al.  Image segmentation by shape-directed covers , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[6]  Thomas M. Breuel,et al.  Finding lines under bounded error , 1996, Pattern Recognit..

[7]  Henry S. Baird,et al.  Language-free layout analysis , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[8]  Robert M. Haralick,et al.  An Optimization Methodology for Document Structure Extraction on Latin Character Documents , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Thomas M. Breuel Robust least-square-baseline finding using a branch and bound algorithm , 2001, IS&T/SPIE Electronic Imaging.