论文信息 - Historical recall and precision: summarizing generated hypotheses

Historical recall and precision: summarizing generated hypotheses

Document recognition involves many kinds of hypotheses: segmentation hypotheses, classification hypotheses, spatial relationship hypotheses, and so on. Many recognition strategies generate valid hypotheses, which are eventually rejected, but current evaluation methods consider only accepted hypotheses. As a result, we have no way to measure errors associated with rejecting valid hypotheses. We propose describing hypothesis generation in more detail, by collecting the complete set of generated hypotheses and computing the recall and precision of this set: we call these the 'historical recall' and 'historical precision.' Using table cell detection examples, we demonstrate how historical recall and precision along with the complete set of generated hypotheses assist in the evaluation, debugging, and design of recognition strategies.

Richard Zanibbi | James R. Cordy | Dorothea Blostein

[1] R. Hindle. Character Recognition and Document Handling in Banks , 1961, Comput. J..

[2] Karl Tombre,et al. The Search for Genericity in Graphics Recognition Applications: Design Issues of the Qgar Software System , 2004, Document Analysis Systems.

[3] Henry S. Baird,et al. DATA STRUCTURES FOR PAGE READERS , 1995 .

[4] Peter Fankhauser,et al. Error tolerant document structure analysis , 1998, International Journal on Digital Libraries.

[5] Robert M. Haralick,et al. CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6] Christian Ah-Soon,et al. A complete system for the analysis of architectural drawings , 2000, International Journal on Document Analysis and Recognition.

[7] J. Cordy,et al. A Survey of Table Recognition : Models , Observations , Transformations , and Inferences , 2003 .

[8] Richard Zanibbi,et al. A language for specifying and comparing table recognition strategies , 2005 .

[9] Jonathan J. Hull,et al. Document Recognition IV , 1997 .

[10] John C. Handley,et al. Table analysis for multiline cell identification , 2000, IS&T/SPIE Electronic Imaging.

[11] David J. Ittner,et al. PROGRAMMABLE CONTEXTUAL ANALYSIS , 1995 .

[12] John C. Handley. Table analysis for multi-line cell identifica-tion , 2001 .