Text/graphic labelling of ancient printed documents

This paper presents a text/graphic labelling for ancient printed documents. Our approach is based on the extraction and the quantification of the various orientations that are present in ancient printed document images. The documents are initially cut into normalized square windows in which we analyze significant orientations with a directional rose. Each kind of information (textual or graphical) is typically identified and marked by its orientation distribution. This choice of characterization allows us to separate textual regions from graphics by minimizing the a priori knowledge. The evaluation of our proposition lies on a page classification using layout extraction criteria. The system has been tested over several ancient printed books of the Renaissance.

[1]  Stéphane Bres Contributions a la quantification des criteres de transparence et d'anisotropie par une approche globale : application au controle de qualite de materiaux composites , 1994 .

[2]  William K. Pratt,et al.  Digital image processing, 2nd Edition , 1991, A Wiley-Interscience publication.

[3]  Jianming Hu,et al.  Page Segmentation of Chinese Newspaper , 2001, PRIS.

[4]  D. Schilling,et al.  Memory efficient quadtree wavelet coding for compound images , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[5]  Gerd Maderlechner,et al.  Classification of documents by form and content , 1997, Pattern Recognit. Lett..

[6]  William K. Pratt,et al.  Digital image processing (2nd ed.) , 1991 .

[7]  J. Todd Book Review: Digital image processing (second edition). By R. C. Gonzalez and P. Wintz, Addison-Wesley, 1987. 503 pp. Price: £29.95. (ISBN 0-201-11026-1) , 1988 .

[8]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Donato Malerba,et al.  Adaptive Layout Analysis of Document Images , 2002, ISMIS.

[10]  David S. Doermann,et al.  Classification of document page images based on visual similarity of layout structures , 1999, Electronic Imaging.

[11]  Pinar Duygulu Sahin,et al.  A hierarchical representation of form documents for identification and retrieval , 2002, International Journal on Document Analysis and Recognition.

[12]  Matti Pietikäinen,et al.  Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[13]  Jianming Hu,et al.  Page segmentation of Chinese newspapers , 2002, Pattern Recognit..