Performance analysis of document image analysis subsystems
暂无分享,去创建一个
As a number of different document image analysis (DIA) algorithms start to mature, there is a significant need for objective evaluation and analysis of their performance. Significant activity has so far concentrated on evaluating OCR results, in which case, the nature of the ground truth data (ASCII characters) lends itself to elaborate analysis using string matching theory to calculate errors and associated costs. Consequently, it has already been possible to automate OCR evaluation using large-scale test-databases. Large-scale testing and evaluation is essential not only for OCR but for each of the subsystems involved in DIA also. This paper presents a new performance analysis framework that focuses on subsystems comprising the layout analysis stage of DIA. The most significant subsystems in this stage are page segmentation and classification. A critical overview of previous approaches to performance analysis for these subsystems is presented. Subsequently, the concept and current state of work towards a new framework for performance analysis developed at the University of Liverpool are presented.