论文信息 - Automatic Feature Extraction on Pages of Antique Books Through a Mathematical Morphology Based Methodology

Automatic Feature Extraction on Pages of Antique Books Through a Mathematical Morphology Based Methodology

This paper presents a mathematical morphology based methodology to identify and extract several components on antique printed books in order to automatically build metadata. These components were previously classified into five different sets (drop capitals, stripes, figures, annotations and text matter) each one characterised by particular geometric features. Based on that assumption several novel algorithms appealing to morphological operators are proposed. The evaluation of the methodology is performed on pages of XVI century books. Key-words Digital antique books, mathematical morphology, geometric features, feature extraction

Pedro Pina | Fernando Muge | Isabel Granado

[1] Norihiro Abe,et al. A clustering-based approach to the separation of text strings from mixed text/graphics documents , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2] Antoni Gasull,et al. Morphological Preprocessing and Binarization for OCR Systems , 1996, ISMM.

[3] João Rogério Caldas Pinto,et al. Automatic Feature Extraction and Recognition for Digital Access of Books of the Renaissance , 2000, ECDL.

[4] Pierre Soille,et al. Morphological Image Analysis , 1999 .

[5] Its'hak Dinstein,et al. Adaptive Directional Morphology with Application to Document Analysis , 1996, ISMM.

[6] Kristel Michielsen,et al. Morphological image analysis , 2000 .

[7] A. Marcolino,et al. Comparing Matching Strategies for Renaissance Printed Words , 2001 .

[8] Michele Mengucci,et al. Morphological Segmentation of Text and Figures in Renaissance Books (XVI Century) , 2000, ISMM.

[9] Jean Serra,et al. Image Analysis and Mathematical Morphology , 1983 .