A new framework for efficient and flexible analysis of the performance of document image analysis subsystems

The need for objective evaluation of the performance of image analysis algorithms is now widely acknowledged and a number of techniques have been devised for various subsystems. In the field of document image analysis (DIA), significant activity has concentrated on evaluating OCR results. In the case of OCR the comparison of experimental results with ground truth is straightforward (ASCII characters) and lends itself to more elaborate analysis using string-matching theory to calculate errors and associated costs. Consequently, it is possible to automate OCR evaluation using large-scale test-databases. Large-scale testing and evaluation is essential not only for OCR but for each of the subsystems involved in DIA also. For instance, the identification of regions of interest in the document page image (page segmentation) and the type of their content (page classification) are significant stages that seriously affect the performance of subsequent DIA stages (e.g. OCR, document image understanding etc.). The work described focuses on subsystems the layout analysis stage. The most subsystems in this stage are page segmentation and classification. The framework described in this paper is focused mainly on performance analysis. A scoring system is also used to provide developers with a higher-level view of the performance of a method in particular aspects. Furthermore, a global score can be easily produced for benchmarking purposes if required.

[1]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  George Nagy DOCUMENT IMAGE ANALYSIS: AUTOMATED PERFORMANCE EVALUATION , 1995 .

[3]  Robert M. Haralick,et al.  CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Luc Vincent,et al.  Pink Panther: A Complete Environment For Ground-Truthing And Benchmarking Document Page Segmentation , 1998, Pattern Recognit..

[5]  Tim Ritchings,et al.  Representation and classification of complex-shaped printed regions using white tiles , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[7]  Kevin W. Bowyer,et al.  Empirical evaluation techniques in computer vision , 1998 .