AGORA: the interactive document image analysis tool of the BVH project

In this paper, we describe how meta-data of indexation can be extracted from historical document images using an interactive process with a software called AGORA. The algorithms involved in AGORA use two maps to segment noisy images: a shape map that focuses on connected components and a background map that provides information on white areas corresponding to block separations in the page. Using a first segmentation result obtained by using these two maps, meta-data can be extracted according to scenarios produced by the users. These scenarios are defined very simply during an interactive stage. The user is able to make processing sequences adapted to the different kinds of images he is likely to meet and according to the desired meta-data. Finally, we describe different experimentations that have been done during the BVH project to test the usability and the performances of AGORA software

[1]  Giovanni Soda,et al.  Artificial neural networks for document analysis and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Frank Lebourgeois,et al.  Compression et accessibilité aux images de documents numérisés Application au projet DEBORA , 2003, Document Numérique.

[3]  Karim Hadjar,et al.  Newspaper page decomposition using a split and merge approach , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[5]  Apostolos Antonacopoulos Segmentation Using the Description of the Background , 1998 .

[6]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[7]  Henry S. Baird Background Structure in Document Images , 1994, Int. J. Pattern Recognit. Artif. Intell..

[8]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[9]  Shin-Ywan Wang,et al.  Block selection: a method for segmenting a page image of various editing styles , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  Karim Hadjar,et al.  Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM) , 2002, Document Analysis Systems.

[11]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[12]  Andy C. Downton,et al.  User-assisted archive document image analysis for digital library construction , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..