Dedicated texture based tools for characterisation of old books

This paper deals with the development of suitable assistance tools for humanists and historians to help them retrieve information documents. This paper represents a part of this ambitious project and deals with the design of a pixel classification method for ancient typewritten documents. The approach presented here is based on a multiresolution map construction and analysis; for five resolutions we construct five different characterisation maps. All the maps are based on texture information (correlation of pixels orientations, grey level pixel density: etc). After the merging of these 25 maps, each pixel of the original image is described by a vector which allows the computing of a hierarchical classification. In order to avoid issues linked to the binarization process, ail maps are computed for grey level images. The system has been tested on a CESR database of ancient printed books of the Renaissance. The classification results are evaluated through several visual classification illustrations

[1]  Lyse Robadey,et al.  2(CREM): une méthode de reconnaissance structurelle dedocuments complexes basée sur des patterns bidimensionnels , 2001 .

[2]  Chew Lim Tan,et al.  Text block segmentation using pyramid structure , 2000, IS&T/SPIE Electronic Imaging.

[3]  R. Lathe Phd by thesis , 1988, Nature.

[4]  Jianming Hu,et al.  Page segmentation of Chinese newspapers , 2002, Pattern Recognit..

[5]  Jean-Yves Ramel,et al.  Text/graphic labelling of ancient printed documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Stéphane Bres Contributions a la quantification des criteres de transparence et d'anisotropie par une approche globale : application au controle de qualite de materiaux composites , 1994 .

[7]  Jean-Yves Ramel,et al.  User-driven page layout analysis of historical printed books , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Venu Govindaraju,et al.  Text - image separation in Devanagari documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Abdel Belaïd,et al.  Neural based binarization techniques , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[10]  Luigi Cinque,et al.  A multiresolution approach for page segmentation , 1998, Pattern Recognit. Lett..

[11]  Thomas M. Breuel,et al.  Two Geometric Algorithms for Layout Analysis , 2002, Document Analysis Systems.

[12]  Karim Hadjar,et al.  Arabic newspaper page segmentation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Andy C. Downton,et al.  A comparison of binarization methods for historical archive documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14]  Jianming Hu,et al.  Page Segmentation of Chinese Newspaper , 2001, PRIS.

[15]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  A.G. Ramakrishnan,et al.  Gabor filters for document analysis in Indian bilingual documents , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[17]  Anil K. Jain,et al.  Text segmentation using gabor filters for automatic document processing , 1992, Machine Vision and Applications.

[18]  Mahesh Viswanathan,et al.  Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Seong-Whan Lee,et al.  Parameter-Free Geometric Document Layout Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..