论文信息 - Why table ground-truthing is hard

Why table ground-truthing is hard

The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem may, in fact, be much more difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable "truth" and/or incomplete or partial "truths".

[1] Robert M. Haralick,et al. CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2] Daniel P. Lopresti,et al. Table structure recognition and its evaluation , 2000, IS&T/SPIE Electronic Imaging.

[3] Xinxin Wang,et al. Tabular Abstraction, Editing, and Formatting , 1996 .

[4] Daniel P. Lopresti,et al. A Tabular Survey of Automated Table Processing , 1999, GREC.

[5] K. S. Baird,et al. Anatomy of a versatile page reader , 1992, Proc. IEEE.

[6] George Nagy. DOCUMENT IMAGE ANALYSIS: AUTOMATED PERFORMANCE EVALUATION , 1995 .