Analyse d'images de documents anciens : une approche texture Old document image analysis : a texture approach

and key words In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analysed documents. Experimentations demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus. Document image analysis, Texture features, Multiresolution, digital libraries, indexation.

[1]  Jean Camillerapp,et al.  Accès par le contenu aux documents manuscrits d'archives numérisés , 2003, Document Numérique.

[2]  Nicole Vincent,et al.  On defining signatures for the retrieval and the classification of graphical drop caps , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[3]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[4]  Christophe Rosenberger,et al.  Mise en oeuvre d'un système adaptatif de segmentation d'images , 1999 .

[5]  Mihran Tuceryan,et al.  Moment-based texture segmentation , 1994, Pattern Recognit. Lett..

[6]  Luigi Cinque,et al.  A multiresolution approach for page segmentation , 1998, Pattern Recognit. Lett..

[7]  Stéphane Bres Contributions a la quantification des criteres de transparence et d'anisotropie par une approche globale : application au controle de qualite de materiaux composites , 1994 .

[8]  V. Eglin Contribution à la structuration fonctionnelle des documents imprimés : exploitation de la dynamique du regard dans le repérage de l'information , 1998 .

[9]  Mahesh Viswanathan,et al.  Two complementary techniques for digitized document analysis , 2000, DOCPROCS '88.

[10]  Venu Govindaraju,et al.  Text - image separation in Devanagari documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Genane Youness,et al.  Une Méthodologie pour la Comparaison de Partitions , 2004 .

[12]  Nicole Vincent,et al.  Power Law Dependencies to Detect Regions of Interest , 2003, DGCI.

[13]  George G. Coghill,et al.  Text analysis using local energy , 2001, Pattern Recognit..

[14]  David S. Doermann,et al.  Font identification using the grating cell texture operator , 2005, IS&T/SPIE Electronic Imaging.

[15]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[17]  Robert M. Haralick,et al.  Zone classification using texture features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[18]  Etienne Loupias Indexation d'images : aide au télé-enseignement et similarités pré-attentives , 2000 .

[19]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[20]  Véronique Eglin,et al.  Analyse d’images de documents anciens: une approche texture , 2008 .

[21]  Sylvie Calabretto,et al.  The Digital Library and Computational Philology: The BAMBI Project , 1997, ECDL.

[22]  Chew Lim Tan,et al.  Text block segmentation using pyramid structure , 2000, IS&T/SPIE Electronic Imaging.

[23]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[24]  Anil K. Jain,et al.  Texture Analysis , 2018, Handbook of Image Processing and Computer Vision.

[25]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[26]  Motoi Iwata,et al.  On the Application of Voronoi Diagrams to Page Segmentation , 1999 .

[27]  Qing Wang,et al.  Hierarchical content classification and script determination for automatic document image processing , 2002, Object recognition supported by user interaction for service robots.

[28]  Mihran Tucceryan,et al.  Moment-based texture segmentation , 1994 .

[29]  Jean-Yves Ramel,et al.  AGORA: the interactive document image analysis tool of the BVH project , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[30]  Matti Pietikäinen,et al.  Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distributions , 1995, Pattern Recognit. Lett..

[31]  Eric Trupin La reconnaissance d'images de documents : Un panorama , 2005 .

[32]  Giovanni Soda,et al.  Tree clustering for layout-based document image retrieval , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[33]  Matti Pietikäinen,et al.  A SURVEY OF TEXTURE-BASED METHODS FOR DOCUMENT LAYOUT ANALYSIS , 2000 .

[34]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Thomas M. Breuel,et al.  Performance Comparison of Six Algorithms for Page Segmentation , 2006, Document Analysis Systems.

[36]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  A.G. Ramakrishnan,et al.  Gabor filters for document analysis in Indian bilingual documents , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[38]  A. G. Ramakrishnan,et al.  Text Localization and Extraction from Complex Color Images , 2005, ISVC.

[39]  Yousri Kessentini,et al.  Handwritten document segmentation using hidden Markov random fields , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[40]  Yue Lu,et al.  Word Grouping in Document Images Based on Voronoi Tessellation , 2004, Document Analysis Systems.

[41]  Thomas M. Breuel,et al.  Two Geometric Algorithms for Layout Analysis , 2002, Document Analysis Systems.

[42]  Karim Hadjar,et al.  Arabic newspaper page segmentation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[43]  Jean-Marc Ogier,et al.  Top-down segmentation of ancient graphical drop caps : lettrines , 2005 .

[44]  Venu Govindaraju,et al.  Multi-scale techniques for document page segmentation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[45]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.