论文信息 - A framework for the assessment of text extraction algorithms on complex colour images

A framework for the assessment of text extraction algorithms on complex colour images

The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehensive performance evaluation framework. Such a framework is extremely difficult to specify, and corresponding pixel-level accurate information tedious to define. In this paper we discuss the challenges and technical issues associated with developing such a framework. Then, we describe a complete framework for the evaluation of text extraction methods at multiple levels, provide a detailed ground-truth specification and present a case study on how this framework can be used in a real-life situation.

[1] Henry S. Baird,et al. Truthing for Pixel-Accurate Segmentation , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[2] Jin Hyung Kim,et al. Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Robert M. Haralick,et al. Performance evaluation of document layout analysis algorithms on the UW data set , 1997, Electronic Imaging.

[4] Apostolos Antonacopoulos,et al. Ground Truth for Layout Analysis Performance Evaluation , 2006, Document Analysis Systems.

[5] Apostolos Antonacopoulos,et al. Colour text segmentation in web images based on human perception , 2007, Image Vis. Comput..

[6] Ioannis Pratikakis,et al. An Objective Evaluation Methodology for Document Image Binarization Techniques , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[7] Stavros J. Perantonis,et al. Text Area Identification in Web Images , 2004, SETN.

[8] Dimosthenis Karatzas,et al. Text Segmentation in Colour Posters from the Spanish Civil War Era , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9] Daniel P. Lopresti,et al. Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.

[10] Jean-Michel Jolion,et al. Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[11] Ioannis Vlahavas,et al. Methods and Applications of Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[12] Beatriz Marcotegui,et al. Scene text localization based on the ultimate opening , 2007, ISMM.

[13] S. Lucas,et al. ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[14] Rolf Ingold,et al. A language-independent, open-vocabulary system based on HMMs for recognition of ultra low resolution words , 2008, SAC '08.