Document image characterization using a multiresolution analysis of the texture: application to old documents

In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analyzed documents. Experimentation based on segmentation, data analysis and document image retrieval tools demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus.

[1]  Robert M. Haralick,et al.  Zone classification using texture features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[3]  Mihran Tucceryan,et al.  Moment-based texture segmentation , 1994 .

[4]  Giovanni Soda,et al.  Tree clustering for layout-based document image retrieval , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[5]  Jean-Marc Ogier,et al.  Top-down segmentation of ancient graphical drop caps : lettrines , 2005 .

[6]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  A.G. Ramakrishnan,et al.  Gabor filters for document analysis in Indian bilingual documents , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[8]  Jean-Yves Ramel,et al.  Ancient Printed Documents Indexation: A New Approach , 2005, ICAPR.

[9]  Thomas M. Breuel,et al.  Performance Comparison of Six Algorithms for Page Segmentation , 2006, Document Analysis Systems.

[10]  Mihran Tüceryan,et al.  Moment based texture segmentation , 1992, ICPR.

[11]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[12]  Véronique Eglin,et al.  Analysis and interpretation of visual saliency for document functional labeling , 2004, Document Analysis and Recognition.

[13]  R. M. Haralick,et al.  Textural features for image classification. IEEE Transaction on Systems, Man, and Cybernetics , 1973 .

[14]  Luigi Cinque,et al.  A multiresolution approach for page segmentation , 1998, Pattern Recognit. Lett..

[15]  Yousri Kessentini,et al.  Handwritten document segmentation using hidden Markov random fields , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[16]  V. Eglin Contribution à la structuration fonctionnelle des documents imprimés : exploitation de la dynamique du regard dans le repérage de l'information , 1998 .

[17]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[18]  Hubert Emptoz,et al.  Type extraction and character prototyping using gabor filters , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[19]  Gerd Maderlechner,et al.  Classification of documents by form and content , 1997, Pattern Recognit. Lett..

[20]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[21]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[22]  Jean-Yves Ramel,et al.  AGORA: the interactive document image analysis tool of the BVH project , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[23]  Venu Govindaraju,et al.  Text - image separation in Devanagari documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  David S. Doermann,et al.  Gabor filter based multi-class classifier for scanned document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[25]  Nicole Vincent,et al.  Power Law Dependencies to Detect Regions of Interest , 2003, DGCI.

[26]  George G. Coghill,et al.  Text analysis using local energy , 2001, Pattern Recognit..

[27]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[28]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[29]  Genane Youness,et al.  Une Méthodologie pour la Comparaison de Partitions , 2004 .

[30]  Venu Govindaraju,et al.  Multi-scale techniques for document page segmentation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[31]  Stéphane Bres Contributions a la quantification des criteres de transparence et d'anisotropie par une approche globale : application au controle de qualite de materiaux composites , 1994 .

[32]  Mahesh Viswanathan,et al.  Two complementary techniques for digitized document analysis , 2000, DOCPROCS '88.

[33]  David S. Doermann,et al.  The retrieval of document images: a brief survey , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.