Automatic Feature Extraction on Pages of Antique Books Through a Mathematical Morphology Based Methodology

This paper presents a mathematical morphology based methodology to identify and extract several components on antique printed books in order to automatically build metadata. These components were previously classified into five different sets (drop capitals, stripes, figures, annotations and text matter) each one characterised by particular geometric features. Based on that assumption several novel algorithms appealing to morphological operators are proposed. The evaluation of the methodology is performed on pages of XVI century books. Key-words Digital antique books, mathematical morphology, geometric features, feature extraction