A Method of Evaluating Table Segmentation Results Based on a Table Image Ground Truther

We propose a novel method to evaluate table segmentation results based on a table image ground truther. In the ground-truthing process, we first extract connected components from a given table image and connect them into an atom graph with weighed edges. Edge weight takes neighboring connected components' size similarities and distances into consideration. Then the ground truther semi-automatically determines the locations and spans of row/column separators according to projection profiles, under human supervision. We evaluate a given table segmentation by computing edit distance from its row and column separator assertions relative to ground truth. The edit distance is the sum of all the edit operation costs that correct wrong row and column separators. Each edit operation cost is a function of the sum of the weights of the edges that the separator cuts through. Thus, separator errors incur different costs depending on the severity of the error, where severity roughly corresponds to how forgivable the error would be considered by a human observer. Experimental results demonstrate that the proposed evaluation method is not only efficient, but also useful in formalizing the intuitive quality of different segmentations.

[1]  William Kornfeld,et al.  Automatically locating, extracting and analyzing tabular data , 1998, SIGIR '98.

[2]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[4]  Steven Skiena,et al.  Geometric probing , 1988 .

[5]  David W. Embley,et al.  Table-processing paradigms: a research survey , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Mohamed Ben Ahmed,et al.  Table recognition evaluation and combination methods , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[7]  Konstantin Zuyev Table image segmentation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[8]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.