A Ground-Truthing Tool for Layout Analysis Performance Evaluation

There is a significant need for performance evaluation of Layout Analysis methods. The greatest stumbling block is the lack of sufficient ground truth. In particular, there is currently no ground-truth for the evaluation of the performance of page segmentation methods dealing with complex-shaped regions and documents with non-uniformly oriented regions.This paper describes a new, flexible, ground-truthing tool. It is fast and easy to use as it performs page segmentation to obtain a first description of regions. The ground-truthing system allows for the editing (merging, splitting and shape alteration) of each of the region outlines obtained from page segmentation. The resulting ground-truth regions are described in terms of isothetic polygons to ensure flexibility and wide applicability. The system also provides for the labelling of each of the ground truth regions according to the type of their content and their logical function. The former can be used to evaluate page classification, while the latter can be used in assessing logical layout structure extraction.

[1]  Robert M. Haralick,et al.  CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  Luc Vincent,et al.  Pink Panther: A Complete Environment For Ground-Truthing And Benchmarking Document Page Segmentation , 1998, Pattern Recognit..

[3]  Apostolos Antonacopoulos Local skew angle estimation from background space in text regions , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Tapas Kanungo,et al.  The architecture of TrueViz: a groundTRUth/metadata editing and VIsualiZing ToolKit , 2003, Pattern Recognit..

[5]  Basilios Gatos,et al.  First International Newspaper Segmentation contest , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Tim Ritchings,et al.  Representation and classification of complex-shaped printed regions using white tiles , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[8]  Tapas Kanungo,et al.  TRUEVIZ: a groundtruth/metadata editing and visualizing toolkit for OCR , 2000, IS&T/SPIE Electronic Imaging.

[9]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Apostolos Antonacopoulos,et al.  Methodology for flexible and efficient analysis of the performance of page segmentation algorithms , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).